学术兼职:
学术会员
1、IEEE Senior Member、CCF高级会员、CSIG高级会员;
2、CSIG多媒体专业委员会副秘书长;
3、安徽省情感计算与先进智能机器重点实验室常务副主任;
4、安徽省人工智能学会理事、安徽省人工智能学会计算机视觉专业委员会主任;
国际会议AC/SPC/PC 学术期刊AE/Reviewer
1、CCF-A类会议: CVPR/NeurIPS/ICLR/AAAI/ACM MM(AC)/IJCAI(SPC)等;
2、CCF-A类期刊: IJCV/IEEE TIP/IEEE TKDE/ACM TOIS/SCIS等;
3、CCF-A类中文期刊: 《计算机学报》/《软件学报》/《自动化学报》等;
4、其它ACM/IEEE Transactions期刊: IEEE TMM(AE)/ACM TOMM(AE)/IEEE TCSVT/IEEE TNNLS/IEEE TAC/IEEE TSMC/IEEE TCYB/IEEE TBD/IEEE TCSS等;
5、其它会议及期刊: ECCV/CIKM/ICASSP/ICME/MMM/ACCV/ICDM/BMVC/PRCV/ICPR等; JBHI/PR(AE)/CVIU(Guest AE)/MTA/KBS/ESWA等.
CALL FOR PAPERS
1. ACM TOMM, Executive Guest Editor
欢迎投稿: ACM Transactions on Multimedia Computing, Communications, and Applications
Special issue on“Deep Learning for Robust Human Body Language Understanding", March 15, 2024
详细信息请见https://dl.acm.org/journal/tomm/special-issues,或images.jpgCFP-SI-Deep-Learning-Robust-Human-Body-Language-Understanding.pdf
2. CVIU, Executive Guest Editor
欢迎投稿: Computer Vision and Image Understanding
Special Issue on "Trustworthy Cross-Modal Reasoning for Video-Language Understanding", December 15, 2023
详细信息请见 https://www.sciencedirect.com/special-issue/10MW5G96J8H ,或 images.jpg[SI-Proposal] .pdf
3. IJCNN, Organizer
欢迎投稿: International Joint Conference on Neural Networks (IJCNN)
Worshop on Saliency Prediction in Action: Industrial Applications for Intelligent Systems (SPAN)
Workshops | IJCNN 2025 详细信息请见 SPAN 25 Deadline: March 20, 2025
CALL FOR CHALLENGES:
1.MAC 2024: ACM Multimedia 2024 Micro-Action Analysis Challenge,Challenge Initiator
第一届于ACM MM 2024成功举办,吸引到全球40余支队伍的参与。
MAC(微动作分析挑战赛)的目标是激发利用全身微动作进行人类行为理解,推动深度心理评估和人类情感状态分析技术的发展。MAC包括Micro-Action Recognition (MAR) track 和 Multi-label Micro-Action Detection (MMAD) track. 更多信息请见 https://sites.google.com/view/micro-action,或附件images.jpgACM Multimedia 2024 Micro-Action Analysis Challenge.jpg
2. 欢迎参加第二届 MAC 2025:ACM Multimedia 2025Micro-Action Analysis Challenge!
更多信息请见 https://sites.google.com/view/micro-action
课题组网页:https://vut-hfut.github.io/ 欢迎有相同专业兴趣爱好的小伙伴一起加入!
主要研究方向
主要研究方向为机器视觉、机器学习、深度学习、模式识别。包括:
1、视听事件理解(Audio-Visual Event Understanding and Parsing);
2、跨模态理解与推理(Cross-modal Understanding and Reasoning);
3、视觉情绪理解(Visual Emotion Captioning and Explaination);
4、时序动作识别与检测(Action Recogintion and Detection);
5、视觉手语识别与翻译(Vison-based Sign Language Recognition and Translation);
6、视觉生理信号检测(Vision-based Physiological Measurement)。
特色研究:
1、视觉情感计算: 非接触式生理信号检测、微动作识别、情绪理解
2、视觉手语机器翻译:手语识别、翻译与生成
3、视听语义解析及定位:视听事件分类、解析与定位
科研项目:
1、国家自然科学基金面上基金,2023-2026,62272144,主持.
2、国家重点研发计划子课题,2022-2025,2022YFB4500601,主持.
3、国家自然科学基金重点项目子课题,2021-2024,U20A20183,主持.
4、国家重点研发计划子课题,2018-2021,2018YFC0830103, 主持
5、国家自然科学基金面上基金,2018-2022,61876058,主持.
6、国家自然科学基金青年基金,2013-2016,61305062,主持.
7、合肥工业大学学术新人提升计划B项目,2020-2021,主持.
8、安徽省自然科学面上基金,2013-2015,主持.
9、博士后面上基金,2013.05-2013.12,主持.
专利成果
[1] 郭丹; 李琦; 孙晓; 黄杰; 汪萌; 基于通道增强时空注意力网络的端到端远程心率检测方法(发明专利), 2024-4-26(授权), 中国, ZL 202210507744.7.
[2] 王飞; 郭丹; 李坤; 汪萌; 一种基于Transformer网络的视频运动放大方法(发明专利), 2023-04-27(实审), 中国, 202310481761.2.
[3] 唐申庚; 肖同欢; 郭丹; 谷纪豪; 曹晨曦; 宋万强; 黄滨; 一种基于图像目标检测和视觉深度估计的碰撞预警方法,2023-2-27(实审),中国,CN202310188292.5.
[4] 唐申庚; 宋万强; 郭丹; 黄滨; 谷纪豪; 肖同欢; 曹晨曦; 一种基于带权无向图的视障人士路线规划方法,2023-3-6 (实审),中国,CN202310228006.3.
[5] 宋培培; 郭丹; 龙馨仪; 汪萌; 基于视觉情感驱动的视频情感描述模型的生成方法及应用(发明专利), 2022-11-21(实审), 中国, 202210982424.7.
[6] 卢天一; 郭丹; 一种动作指导的视频描述方法(发明专利), 2022-06-29(实审), 中国.
[7] 郭丹; 何梓贻; 倪友炜; 李坤; 徐梓鑫; 马嘉淇; 罗匡; 一种基于目标检测的碗碟清洗设备(实用新型), 2023-5-12(授权), 中国, ZL202220873705.4.
[8] 郭丹; 唐申庚; 刘祥龙; 洪日昌; 汪萌; 一种基于图卷积的多模态融合手语识别系统及方法(发明专利), 2023-3-14(授权), 中国, ZL202010049714.7.
[9] 郭丹; 唐申庚; 刘祥龙; 汪萌; 一种基于多层次语义解析的手语翻译系统及方法(发明专利), 2023-3-28(授权), 中国, ZL202010103960.6.[4]
[10] 赵烨; 胡晓斌; 胡珍珍; 刘学亮; 郭丹; 郭艳蓉; 吴乐; 一种基于注意力模型的视频摘要描述生成方法及装置, 2022-12-9(授权), 中国, ZL202110565400.7.
[11] 郭丹; 宋培培; 刘祥龙; 汪萌; 基于递归记忆网络的无监督图像描述模型的生成方法(发明专利), 2022-3-15(授权), 中国, ZL202010049142.2.
[12] 郭丹; 宋培培; 刘祥龙; 汪萌; 基于数据自驱动的多阶特征动态融合手语翻译方法(发明专利), 2022-3-15(授权), 中国, ZL202010096391.7.
[13] 郭丹; 王辉; 汪萌; 一种基于上下文感知图神经网络的视觉对话生成方法(发明专利), 2021-6-8(授权), 中国, ZL201910881298.4.
[14] 郭丹; 李坤; 汪萌; 一种基于多尺度注意力机制的人群密度估计方法(发明专利), 2021-3-9(授权), 中国, ZL201910531606.0.
[15] 郭丹; 宋培培; 赵烨; 汪萌; 基于自适应隐马尔可夫的多特征融合手语识别方法(发明专利), 2020-07-10(授权), 中国, ZL201811131806.9.
[16] 郭丹; 汪萌; 周文罡; 李厚强; 李传青; 李安阳; 基于非对称多层LSTM的连续手语视频自动翻译方法(发明专利), 2020-2-11(授权), 中国, ZL201810027551.5.
[17] 郭丹; 王硕; 汪萌; 基于时域卷积网络与循环神经网络融合的手语视频翻译方法(发明专利), 2019-10-18(授权), 中国, ZL201811070290.1.
[18] 汪萌; 张鹿鸣; 郭丹; 一种基于多任务拓扑学习的航拍图像快速识别系统及其快速识别方法, 2018-2-6(授权), 中国, ZL201510080478.4.
[19] 汪萌; 张鹿鸣; 郭丹; 田绪婷; 一种基于几何重构和语义融合的视点追踪方法, 2017-10-3(授权), 中国, ZL201410733763.7.
[20] 郭丹; 胡学钢; 倪武; 吴信东; 一种基于最大流率路径优先的路网疏散规划方法(发明专利), 2017-6-6(授权), 中国, ZL201510451828.3.
[21] 汪萌; 杨勋; 洪日昌; 郭丹; 刘奕群; 孙茂松; 一种基于语义映射空间构建的图像检索方法, 2017-5-17, 中国(授权), ZL201410393094.3.
[22] 汪萌; 洪日昌; 李炳南; 刘奕群; 郭丹; 刘学亮; 吴信东; 杨勋; 基于连续数标号子空间学习的检索重排序方法(授权), 2017-2-22, 中国, ZL201410196946.X.
[23] 汪萌; 张鹿鸣; 郭丹; 刘奕群; 孙茂松; 鲁志红; 基于GPS信息视频的三维场景重建方法, 2017-2-22(授权), 中国, ZL201410752454.4.
[24] 郭丹; 谷纪豪; 唐申庚; 肖同欢; 曹晨曦; 宋万强; 一种基于深度智能交互的室外视障辅助方法(发明专利), 2024-02-20(授权), 中国,202210371804.7.
[25] 郭丹; 曹晨曦; 肖同欢; 唐申庚; 谷纪豪; 黄滨; 一种基于语义分割的择优式方向偏移预警系统和方法(发明专利), 2024-02-27(授权), 中国,202210374860.6.
[26] 郭丹; 刘飞扬; 李坤; 汪萌; 一种基于渐进性视野锥的注视目标估计方法(发明专利), 2024-01-23(实审), 中国, 202410100320.8.
[27] 郭丹; 刘泽宽; 郭义臣; 唐申庚; 武梓龙; 文则涵; 陈颖男; 一种基于深度学习的WiFi手语翻译系统及方法(发明专利), 2022-10-20(实审), 中国, 202210805408.0.
发表论文:(主要包括CCF-A会议/期刊、IEEE/ACM Transactions期刊等)
Highlights:
1. Dan Guo, Hui Wang, and Meng Wang*, "Context-Aware Graph Inference with Knowledge Distillation for Visual Dialog", IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2021.
2. Jinxing Zhou, Dan Guo* and Meng Wang*. "Contrastive Positive Sample Propagation along the Audio-Visual Event Line", IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2022.
3. Chunxiao Fan, Dan Guo*, Ziqi Wang, Meng Wang. “Multi-Objective Convex Quantization for Efficient Model Compression”, IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2024
4. Jinxing Zhou, Dan Guo*, Yiran Zhong, Meng Wang*. "Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling", International Journal of Computer Vision (IJCV, CCF-A期刊), 2024.
5. Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang* , Yiran Zhong*. “Audio-Visual Segmentation with Semantics”, International Journal of Computer Vision (IJCV, CCF-A期刊), 2024.
6. Dan Guo, Kun Li*, Bin Hu, Yan Zhang, Meng Wang*. "Benchmarking Micro-action Recognition: Dataset, Methods, and Applications", IEEE Transactions on Circuits and Systems for Video Technology. (IEEE TCSVT, Trans.汇刊), 2024.
7. Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang∗. “PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation”, IEEE Transactions on Computational Social Systems (IEEE TCSS, Trans.汇刊), 2024
8. Dan Guo, Wengang Zhou, Houqiang Li, and Meng Wang, "Hierarchical LSTM for Sign Language Translation", AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议, oral paper, Top 5% ), 2018.
9. Dan Guo, Hui Wang*, Hanwang Zhang, Zhengjun Zha, and Meng Wang*, "Iterative Context-Aware Graph Inference for Visual Dialog", Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A 会议, oral paper, Top 5%), 2020.
10. Fei Wang, Dan Guo*, Kun Li, Zhun Zhong, Meng Wang*. "Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture", Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2024.
2025
1. Kun Li, Dan Guo*, Guoliang Chen*, Chunxiao Fan, Jingyuan Xu, zhiliang wu, Hehe Fan, Meng Wang*. “Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition,AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
2. Shengeng Tang, Jiayi He, Dan Guo, Yanyan Wei, Feng Li, Richang Hong. “Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
3. Pengcheng Zhao, Jinxing Zhou, Dan Guo*, Yang Zhao, Yanxiang Chen*. “Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
4. Ziheng Zhou, Jinxing Zhou, Wei Qian, Shengeng Tang, Xiaojun Chang, Dan Guo*. “Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
5. Wei Qian, Gaoji Su, Dan Guo*, Jinxing Zhou, Xiaobai Li, Bin Hu, Shengeng Tang, Meng Wang*. “PhysDiff: Physiology-based Dynamicity Disentangled Diffusion Model for Remote Physiological Measurement”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议, oral, top 4.6%), 2025.
6. Jingjing Hu, Dan Guo*, Zhan Si, Deguang Liu, Yunfeng Diao, Jing Zhang, Jinxing Zhou, Meng Wang*. “MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
7. Zhangbin Li, Jinxing Zhou, Jing Zhang, Shengeng Tang, Kun Li, Dan Guo*. “Patch-level Sounding Object Tracking for Audio-Visual Question Answering”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
8. Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, Xun Yang. “AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring”, AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2025.
9. Kun Li, Xinge Peng, Dan Guo*, Xun Yang, Meng Wang*. “Repetitive Action Counting with Hybrid Temporal Relation Modeling”, IEEE Transactions on Multimedia (IEEE TMM, Trans.汇刊), 2025.
10. Zhenqiang Zhang, Kun Li, Shengeng Tang, Yanyan Wei, Fei Wang, Jinxing Zhou, Dan Guo*. “Temporal Boundary Awareness Network for Repetitive Action Counting,” ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP , Trans.汇刊), 2025.
11. Zhao Xie, Longsheng Lu, Kewei Wu, Zhehan Kan, Xingming Yang, Dan Guo*. “Instructive Probabilistic Transformer for Complex Action Recognition”, IEEE Transactions on Multimedia (IEEE TMM, Trans.汇刊), 2025.
12. Xinke Wang, Jingyuan Xu, Xiao Sun*, Mingzheng Li, Bin Hu, Wei Qian, Dan Guo*, Meng Wang*. “Facial Depression Estimation via Multi-Cue Contrastive Learning”, IEEE Transactions on Circuits and Systems for Video Technology. (IEEE TCSVT, Trans.汇刊), 2025.
13. Peipei Song, Long Zhang, Long Lan, Weidong Chen, Dan Guo, Xun Yang*, and Meng Wang. “Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering”, IEEE Transactions on Multimedia (IEEE TMM, Trans.汇刊), 2025.
14. Xu Liu,Na Xia*, Jinxing Zhou, Zhangbin Li, Dan Guo*. “Towards Energy-efficient Audio-Visual Classification via Multimodal Interactive Spiking Network”, ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP , Trans.汇刊), 2025.
15. Jingjing Hu,Dan Guo*,Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang. “Unified Static and Dynamic Network: Efficient Temporal Filtering Video Grounding”, IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2025.
16. Yunfeng Diao, Kaichao Jiang, Dan Guo*, Zhenyu Liang*, Zenglin Shi, Zhenxing Qian, Meng Wang. “Post-train Black-box Defense via Energy-based Bayesian Adversarial Training”, SCIENTIA SINICA Informationis, 2025.
17. Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-Seng Chua, Angela Yao. “EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering”, Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2025.
18. Shengeng Tang, Jiayi He, Lechao Cheng*, Jingjing Wu, Dan Guo, Richang Hong*. “Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations”, Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2025.
19. Zhenxing Zhang, Yaxiong Wang, Lechao Cheng, Zhun Zhong, Dan Guo, Meng Wang. “ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding”, Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2025.
20. Jinxing Zhou, Dan Guo*, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang, Meng Wang*. “Towards Open-Vocabulary Audio-Visual Event Localization”, Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2025.
2024
21. Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang* , Yiran Zhong*. “Audio-Visual Segmentation with Semantics”, International Journal of Computer Vision (IJCV, CCF-A期刊), 2024.
22. Jinxing Zhou, Dan Guo*, Yiran Zhong, Meng Wang*. "Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling", International Journal of Computer Vision (IJCV, CCF-A期刊), 2024.
23. Shuaiyang Li, Feng Xue, Kang Liu, Dan Guo, Richang Hong. "Multimodal Graph Causal Embedding for Multimedia-based Recommendation", IEEE Transactions on Knowledge and Data Engineering (TKDE, Trans.汇刊, CCF-A 期刊),2024.
24. Chunxiao Fan, Dan Guo*, Ziqi Wang, Meng Wang*. “Multi-Objective Convex Quantization for Efficient Model Compression”, IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2024.
25. Wei Qian, Kun Li, Dan Guo*, Bin Hu, Meng Wang*. "Cluster-Phys: Facial Clues Clustering Towards Efficient Remote Physiological Measurement", ACM Mutilmedia (ACM MM, CCF-A会议, Oral paper, top 3.97%), 2024.
26. Jingjing Hu, Dan Guo*, Kun Li, Zhan Si, Xun Yang*, Meng Wang*. "Maskable Retentive Network for Video Moment Retrieval", ACM Mutilmedia (ACM MM, CCF-A会议,), 2024.
27. Xun Yang*, Jianming Zeng, Dan Guo, Shanshan Wang, Jianfeng Dong, Meng Wang. "Robust video question answering via contrastive cross-modality representation learning", Science China Information Sciences (SCIS, CCF-A 期刊 ), 2024.
28. Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang∗. “PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation”, IEEE Transactions on Computational Social Systems (IEEE TCSS, Trans.汇刊), 2024.
29. Jinxing Zhou, Dan Guo*, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang. "Label-anticipated Event Disentanglement for Audio-Visual Video Parsing", European Conference on Computer Vision (ECCV), 2024.
30. Jing Zhang, Liang Zheng*, Meng Wang, Dan Guo*. "Training A Small Emotional Vision Language Model for Visual Art Comprehension", European Conference on Computer Vision (ECCV), 2024.
31. Fei Wang, Dan Guo*, Kun Li, Zhun Zhong, Meng Wang*. "Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture", Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2024.
32. Chunxiao Fan, Ziqi Wang, Dan Guo*, Meng Wang. "Data-Free Quantization via Pseudo-label Filtering", Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A会议), 2024.
33. Fei Wang, Dan Guo*, Kun Li, Meng Wang*. "EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer", AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2024.
34. Zhangbin Li, Dan Guo*, Jinxing Zhou*, Jing Zhang, Meng Wang. "Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering", AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2024.
35. Zhao Xie, Yadong Shi, Kewei Wu, Yaru Cheng, Dan Guo*. "Towards Understanding Future: Consistency Guided Probabilistic Modeling for Action Anticipation", AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2024.
36. Liu Liu, Anran Huang, Qi Wu, Dan Guo*, Xun Yang, Meng Wang. "KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking". AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2024.
37. Xinyi Wu, Wentao Ma, Dan Guo, Tongqing Zhou, Shan Zhao, Zhiping Cai. "Text-based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning", AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议), 2024.
38. Peipei Song, Dan Guo*, Xun Yang, Shengeng Tang, and Meng Wang. "Emotional Video Captioning with Vision-based Emotion Interpretation Network", IEEE Transactions on Image Processing (IEEE TIP, Trans.汇刊, CCF-A期刊), 2024.
39. Zhao Xie, Chang Jiao, Kewei Wu*, Dan Guo* and Richang Hong. "Active Factor Graph Network for Group Activity Recognition", IEEE Transactions on Image Processing (IEEE TIP, Trans.汇刊, CCF-A期刊), 2024.
40. Dan Guo, Kun Li*, Bin Hu, Yan Zhang, Meng Wang*. "Benchmarking Micro-action Recognition: Dataset, Methods, and Applications", IEEE Transactions on Circuits and Systems for Video Technology. (IEEE TCSVT, Trans.汇刊), 2024.
41. Feiyang Liu, Kun Li, Zhun Zhong, Wei Jia, Bin Hu, Xun Yang*, Meng Wang*, Dan Guo*. “Depth Matters: Spatial Proximity-based Gaze Cone Generation for Gaze Following in Wild”, ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP , Trans.汇刊), 2024.
42. Xin Liu, Biao Qian, Haipeng Liu*, Dan Guo, Yang Wang, Meng Wang*. "Seeking False Hard Negatives for Graph Contrastive Learning", IEEE Transactions on Circuits and Systems for Video Technology. (IEEE TCSVT, Trans.汇刊), 2024.
43. Kewei Wu , Wenjie Luo , Zhao Xie , Dan Guo , Zhao Zhang , and Richang Hong. "Ensemble Prototype Network For Weakly-Supervised Temporal Action Localization", IEEE Transactions on Neural Networks and learning systems (IEEE TNNLS, Trans.汇刊), 2024.
44. Wei Qian, Dan Guo*, Kun Li, Xiaowei Zhang, Xilan Tian, Xun Yang, Meng Wang*, "Dual-path TokenLearner for Remote Photoplethysmography-based Physiological Measurement with Facial Videos", IEEE Transactions on Computational Social Systems (IEEE TCSS, Trans.汇刊), 2024.
2023
45. Peipei Song, Dan Guo*, Xun Yang, Shengeng Tang, Erkun Yang, and Meng Wang*. "Emotion-Prior Awareness Network for Emotional Video Captioning", ACM International Conference on Multimedia (ACM MM ,CCF-A 会议, Oral paper, top 5.4%), 2023.
46. Sheng Zhou, Dan Guo*, Jia Li, Xun Yang*, and Meng Wang. "Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA", IEEE Transactions on Image Processing (TIP, Trans.汇刊, CCF-A期刊 ), 2023.
47. Kun Li, Dan Guo*, and Meng Wang*. "ViGT: Proposal-free Video Grounding with Learnable Token in Transformer", Science China Information Sciences (SCIS, CCF-A期刊), 2023.
48. Xinge Peng, Kun Li*, Jiaxiu Li, Guoliang Chen, and Dan Guo*. "Multi-modality Fusion for Emotion Recognition in Videos", IJCAI (CCF-A会议) Challenge paper, 2023.
49. Kun Li, Dan Guo*, Guoliang Chen, Xinge Peng, and Meng Wang. "Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification", IJCAI (CCF-A会议) Challenge paper, 2023.
50. Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo*, and Meng Wang*. "Exploiting Diverse Feature for Multimodal Sentiment Analysis", ACM MM (CCF-A 会议) Challenge paper, 2023.
51. Kun Li, Dan Guo* , Guoliang Chen, Feiyang Liu and Meng Wang. "Data Augmentation for Human Behavior Analysis in Multi-Person Conversations", ACM MM (CCF-A 会议) Challenge paper, 2023.
52. Kun Li, Jiaxiu Li, Dan Guo*, Xun Yang*, and Meng Wang. "Transformer-based Visual Grounding with Cross-modality Interaction", ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP , Trans.汇刊), 2023.
53. Qi Li, Dan Guo*, Wei Qian, Xilan Tian, Xiao Sun, Haifeng Zhao, and Meng Wang*. "Channel-wise Interactive Learning for Remote Heart Rate Estimation from Facial Video", IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT, Trans.汇刊),2023.
54. Jing Zhang, Dan Guo*, Xun Yang*, Peipei Song, and Meng Wang*. "Visual-Linguistic-Stylistic Triple Reward for Cross-Lingual Image Captioning", ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP , Trans.汇刊), 2023.
55. Sheng Zhou, Dan Guo*, Xun Yang*, Jianfeng Dong, and Meng Wang*. "Graph Pooling Inference Network for Text-Based VQA", ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP , Trans.汇刊), 2023.
56. Shuaiyang Li, Dan Guo, Kang Liu, Richang Hong, and Feng Xue. "Multimodal Counterfactual Learning Network for Multimedia-based Recommendation", Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, CCF-A会议), 2023.
57. Kang Liu, Feng Xue*, Dan Guo, Peijie Sun, Shengsheng Qian, and Richang Hong. "Multimodal Graph Contrastive Learning for Multimedia-based Recommendation", IEEE Transactions on Multimedia (IEEE TMM, Trans.汇刊), 2023.
58. Wentao Ma, Xinyi Wu, Shan Zhao*, Tongqing Zhou*, Dan Guo, Lichuan Gu, Zhiping Cai, and Meng Wang. "FedSH: Towards Privacy-preserving Text-based Person Re-Identification", IEEE Transactions on Multimedia (IEEE TMM, Trans.汇刊), 2023.
59. Kang Liu, Feng Xue*, Dan Guo, Le Wu, Shujie Li, and Richang Hong. "MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation", ACM Transactions on Information Systems (ACM TOIS, Trans.汇刊, CCF-A期刊), 2023.
60. Feng Xue*, Tian Yang, Kang Liu, Zikun Hong, Mingwei Cao, Dan Guo, and Richang Hong. "LCSNet: End-to-end Lipreading with Channel-aware Feature Selection", ACM Transactions on Multimedia Computing, Communications, and Applications (ACM TOMM, Trans.汇刊), 2023.
61. 郭丹,姚沈涛,王辉,汪萌.嵌入局部聚类描述符的视频问答Transformer模型[J]. 计算机学报 (CCF-A 中文期刊), 2023.
2022
62. Jinxing Zhou, Dan Guo* and Meng Wang*. "Contrastive Positive Sample Propagation along the Audio-Visual Event Line", IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2022.
63. Shengeng Tang, Richang Hong*, Dan Guo*, and Meng Wang, "Gloss Semantic-Enhanced Network with Online Back-Translation for Sign Language Production", ACM International Conference on Multimedia (ACM MM ,CCF-A 会议), 2022.
64. Peipei Song, Dan Guo*, Jun Cheng, and Meng Wang*, "Contextual Attention Network for Emotional Video Captioning", IEEE Transactions on Multimedia (TMM, Trans.汇刊 ), 2022.
65. Peipei Song, Dan Guo*, Jinxing Zhou, Mingliang Xu, and Meng Wang*, "Memorial GAN with Joint Semantic Optimization for Unpaired Image Captioning", IEEE Transactions on Cybernetics (TCYB, Trans.汇刊 ), 2022.
66. Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Meng Wang*, and Yiran Zhong*, "Audio−Visual Segmentation", European Conference on Computer Vision (ECCV), 2022.
67. Tianyuan Xu, Xueliang Liu*, Zhen Huang*, Dan Guo, Richang Hong, and Meng Wang. "Early-Learning regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels", ACM International Conference on Multimedia (ACM MM, CCF-A会议), 2022.
68. Zhao Xie, Jiansong Chen, Kewei Wu*, Dan Guo, and Richang Hong. "Global Temporal Difference Network for Action Recognition", IEEE Transactions on Multimedia (IEEE TMM, Trans.汇刊), 2022.
69. Kang Liu, Feng Xue*, Xiangnan He, Dan Guo, and Richang Hong. "Joint Multi-Grained Popularity-Aware Graph Convolution Collaborative Filtering for Recommendation", IEEE Transactions on Computational Social Systems (IEEE TCSS, Trans.汇刊), 2022.
2021
70. Dan Guo, Hui Wang, and Meng Wang*, "Context-Aware Graph Inference with Knowledge Distillation for Visual Dialog", IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, Trans.汇刊, CCF-A期刊, IF 24.314 ), 2021.
71. Hui Wang, Dan Guo*, Xiansheng Hua, and Meng Wang*, "Pairwise VLAD Interaction Network for Video Question Answering", ACM International Conference on Multimedia (ACM MM, CCF-A 会议), 2021.
72. Kun Li, Dan Guo*, and Meng Wang*, "Proposal-Free Video Grounding with Contextual Pyramid Network", AAAI Conference on Artificial Intelligence (AAAI, CCF-A 会议), 2021.
73. Shengeng Tang, Dan Guo*, Richang Hong*, and Meng Wang, "Graph-Based Multimodal Sequential Embedding for Sign Language Translation", IEEE Transactions on Multimedia (TMM, Trans.汇刊), 2021.
2020
74. Dan Guo, Hui Wang, Shuhui Wang, and Meng Wang*, "Textual-Visual Reference-Aware Attention Network for Visual Dialog", IEEE Transactions on Image Processing (TIP, Trans.汇刊, CCF-A期刊), 2020.
75. Dan Guo, Wengang Zhou*, Anyang Li, Houqiang Li, and Meng Wang*, "Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization for Sign Language Translation", IEEE Transactions on Image Processing (TIP, Trans.汇刊, CCF-A期刊), 2020.
76. Dan Guo, Hui Wang*, Hanwang Zhang, Zhengjun Zha, and Meng Wang*, "Iterative Context-Aware Graph Inference for Visual Dialog", Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A 会议, oral paper, Top 5%), 2020.
77. Dan Guo, Yang Wang*, Peipei Song*, and Meng Wang, "Recurrent Relational Memory Network for Unsupervised Image Captioning", International Joint Conference on Artificial Intelligence (IJCAI, CCF-A会议, 录取率12.6%), 2020.
2019
78. Dan Guo, Kun Li*, and Meng Wang, "DADNet:Dilated-Attention-Deformable ConvNet for Crowd Counting", ACM International Conference on Multimedia (ACM MM, CCF-A 会议, oral paper, Top 9.8% ), 2019.
79. Dan Guo, Shengeng Tang,and Meng Wang, "Connectionist Temporal Modeling of Video and Language:A Joint Model for Translation and Sign Labeling", International Joint Conference on Artificial Intelligence (IJCAI, CCF-A会议 ), 2019.
80. Dan Guo, Shuo Wang, Qi Tian, and Meng Wang, "Dense Temporal Convolution Network for Sign Language Translation", International Joint Conference on Artificial Intelligence (IJCAI, CCF-A会议), 2019.
81. Dan Guo, Hui Wang, and Meng Wang, "Dual Visual Attention Network for Visual Dialog", International Joint Conference on Artificial Intelligence (IJCAI, CCF-A会议), 2019.
82. Shuo Wang, Dan Guo*, Xin Xu, Li Zhuo, and Meng Wang, "Cross-Modality Retrieval by Joint Correlation Learning", ACM Transactions on Multimedia Computing Communications and Applications (ACM TOMCCAP , Trans.汇刊 ), 2019.
2018&Before
83. Shuo Wang, Dan Guo*, Wengang Zhou, Zhengjun Zha, and Meng Wang, "Connectionist Temporal Fusion for Sign Language Translation", International ACM International Conference on Multimedia (ACM MM, CCF-A会议 ), 2018.
84. Dan Guo, Wengang Zhou, Houqiang Li, and Meng Wang, "Hierarchical LSTM for Sign Language Translation", AAAI Conference on Artificial Intelligence (AAAI, CCF-A会议, oral paper, Top 5% ), 2018.
85. Dan Guo, Wengang Zhou*, Houqiang Li*, and Meng Wang*, "Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition", ACM Transactions on Multimedia Computing Communications and Applications (ACM TOMCCAP , Trans.汇刊 ), 2018.
86. 鲁志红, 郭丹*, 汪萌. 基于加权运动估计和矢量分割的运动补偿内插算法[j]. 自动化学报 (CCF-A中文期刊), 2015.
会议论文:
1.Jiazhen, Zhang; Kun, Li; Yanyan, Wei; Fei, Wang; Wei, Qian; Jinxing, Zhou; Dan, Guo。Repetitive Action Counting with Feature Interaction Enhancement and Adaptive Gate Fusion.MMAsia '24: ACM Multimedia Asia.
2.Fan, Chunxiao; Wang, Ziqi; Guo, Dan*; Wang, Meng.Data-Free Quantization via Pseudo-label Filtering.IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024-06-16 To 2024-06-22.
3.Wang, Fei; Guo, Dan*; Li, Kun; Zhong, Zhun; Wang, Meng*.Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture.IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024-06-16 To 2024-06-22.
4.Li, Zhangbin; Guo, Dan*; Zhou, Jinxing; Zhang, Jing; Wang, Meng.Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering.38th AAAI Conference on Artificial Intelligence (AAAI) / 36th Conference on Innovative Applications of Artificial Intelligence / 14th Symposium on Educational Advances in Artificial Intelligence, 2024-02-20 To 2024-02-27.
5.Sun, Jiahui; Song, Peipei*; Zhang, Jing; Guo, Dan*.Syntax-Controllable Video Captioning with Tree-Structural Syntax Augmentation.2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition (CVIPPR), 2024-04-26 To 2024-04-28.
6.Li, Shuaiyang; Guo, Dan; Liu, Kang; Hong, Richang; Xue, Feng*.Multimodal Counterfactual Learning Network for Multimedia-based Recommendation.46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, 2023-07-23 To 2023-07-27.
7.Tang, Shengeng; Hong, Richang; Guo, Dan; Wang, Meng.Gloss Semantic-Enhanced Network with Online Back-Translation for Sign Language Production.30th ACM International Conference on Multimedia, MM 2022, 2022-10-10 To 2022-10-14.
8.Xu, Tianyuan; Liu, Xueliang; Huang, Zhen; Guo, Dan; Hong, Richang; Wang, Meng.Early-Learning regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels.30th ACM International Conference on Multimedia, MM 2022, 2022-10-10 To 2022-10-14.
9.Yao S.; Li K.*; Xing K.; Wu K.; Xie Z.; Guo D..Differentiated Attention with Multi-modal Reasoning for Video Question Answering.2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms, EEBDA 2022, 2022-02-25 To 2022-02-27.
10.Zhou, Jinxing; Wang, Jianyuan; Zhang, Jiayi; Sun, Weixuan; Zhang, Jing; Birchfield, Stan; Guo, Dan; Kong, Lingpeng; Wang, Meng*; Zhong, Yiran*.Audio-Visual Segmentation.17th European Conference on Computer Vision (ECCV), 2022-10-23 To 2022-10-27.
11.Hui Wang; Dan Guo*; Xian-Sheng Hua; Meng Wang*。Pairwise VLAD Interaction Network for Video Question Answering.29th ACM International Conference on Multimedia (ACM MM 2021), Virtual, Online, China, 2021-10-20 To 2021-10-24.
12.Li, Kun; Guo, Dan*; Wang, Meng*.Proposal-Free Video Grounding with Contextual Pyramid Network.35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence, 2021-02-02 To 2021-02-09.
13.Dan, Guo; Yang, Wang; Peipei, Song; Meng, Wang.Recurrent Relational Memory Network for Unsupervised Image Captioning.Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}, 2020-07-11 To 2020-07-17.
14.Peng, Fan; Li, Kun*; Liu, Xueliang; Guo, Dan.AOPNet: Anchor Offset Prediction Network for Temporal Action Proposal Generation.2020 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2020, 2020-08-21 To 2020-08-23.
15.Guo, Dan; Wang, Hui*; Zhang, Hanwang; Zha, Zheng-Jun; Wang, Meng.Iterative Context-Aware Graph Inference for Visual Dialog.2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, 2020-06-14 To 2020-06-19.
16.Guo, Dan; Li, Kun*; Zha, Zheng-Jun; Wang, Meng.DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting.27th ACM International Conference on Multimedia (MM), 2019-10-21 To 2019-10-25.
17.Gui, Yuling; Guo, Dan; Zhao, Ye.Semantic enhanced encoder-decoder network (SEN) for video captioning.2nd Workshop on Multimedia for Accessible Human Computer Interfaces, MAHCI 2019, in conjunction with ACM Multimedia 2019, 2019-10-25 To 2019-10-25.
18.Guo, Dan; Wang, Hui; Wang, Meng.Dual visual attention network for visual dialog.28th International Joint Conference on Artificial Intelligence, IJCAI 2019, 2019-08-10 To 2019-08-16.
19.Guo, Dan; Wang, Shuo; Tian, Qi; Wang, Meng.Dense temporal convolution network for sign language translation.28th International Joint Conference on Artificial Intelligence, IJCAI 2019, 2019-08-10 To 2019-08-16.
20.Guo, Dan; Tang, Shengeng; Wang, Meng.Connectionist temporal modeling of video and language: A joint model for translation and sign labeling.28th International Joint Conference on Artificial Intelligence, IJCAI 2019, 2019-08-10 To 2019-08-16.
21.Song, Peipei; Guo, Dan*; Xin, Haoran; Wang, Meng.Parallel Temporal Encoder for Sign Language Translation.26th IEEE International Conference on Image Processing, ICIP 2019, 2019-09-22 To 2019-09-25.
22.Pei, Xiankun; Guo, Dan; Zhao, Ye.Continuous sign language recognition based on pseudo-supervised learning.2nd Workshop on Multimedia for Accessible Human Computer Interfaces, MAHCI 2019, in conjunction with ACM Multimedia 2019, 2019-10-25.
23.Wang Shuo; Guo Dan*; Zhou Wen Gang; Zha Zheng Jun; Wang Meng.Connectionist temporal fusion for sign language translation.26th ACM Multimedia conference, MM 2018, 2018-10-22 to 2018-10-26.
24.Guo Dan; Zhou Wengang; Li Houqiang; Wang Meng.Hierarchical LSTM for sign language translation.32nd AAAI Conference on Artificial Intelligence, AAAI 2018, United States, 2018-02-02 to 2018-02-07.
25.Dan Guo*; Wengang Zhou; Houqiang Li; Meng Wang.Hierarchical LSTM for Sign Language Translation.AAAI Conference on Artificial Intelligence, United States, 2018-02-02 to 2018-02-07.
26.Wu Ni; Dan Guo; Hailei Wang; Wenbo Li.Contraflow-Constrained Evacuation Route Planning.13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, China, 2017-07-29 to 2017-07-31.
27.Dan Guo*; Chen Gao; Wu Ni; Xuegang Hu.Max-Flow Rate Priority Algorithm for Evacuation Route Planning.IEEE International Conference on Data Science in Cyberspace 2016, China, 2016-06-13 to 2016-06-16.
28.Dan Guo*; Ermao Yuan; Xuegang Hu.Frequent Pattern Mining based on Approximate Edit Distance Matrix.IEEE International Conference on Data Science in Cyberspace 2016, China, 2016-06-13 to 2016-06-16.
29.Dan Guo*; Wengang Zhou; Meng Wang; Houqiang Li.Sign language recognition based on adaptive HMMS with data augmentation.2016 IEEE International Conference on Image Processing (ICIP), United States, 2016-09-25 to 2016-09-28.
30.马晓文; 胡学钢; 谢飞; 郭丹.带通配符的多序列模式挖掘
31.黄国林; 郭丹; 胡学钢.基于通配符和长度约束的近似模式匹配算法
32.黄国林; 郭丹; 胡学钢.求解近似模式匹配的启发式算法
33.Tian, Weidong*; Jiang, Haiqiu; Zhou, Hongjuan; Guo, Dan; Li, Wenbo; Lu, Yang.A decision-making model for distribution center location during earthquake responses.2012 International Conference on Material Sciences and Manufacturing Technology, ICMSMT 2012, China, 2012-10-05 to 2012-10-06.
34.Guo, Dan*; Xiang, Taining; Hu, Xuegang; Wu, Xindong.Flexible pattern matching with gap-length and one-off conditions.25th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2013, United States, 2013-11-04 to 2013-11-06.
35.Qiang, Jipeng*; Tian, Weidong; Guo, Dan; Wu, Xindong.Online Pattern Matching with Wildcards.IEEE International Conference on Granular Computing (GrC), China, 2012-08-11 to 2012-08-13.
36.Xie, Fei*; Wu, Xindong; Hu, Xuegang; Gao, Jun; Guo, Dan; Fei, Yulian; Hua, Ertian.Sequential Pattern Mining with Wildcards.22nd International Conference on Tools with Artificial Intelligence, France, 2010-10-27 to 2010-10-29.
1. IJCAI Challenge on Micro-gesture Analysis for Hidden Emotion Understanding, 1st Place in Micro-gesture Classification Track.(2023年5月)
2. IJCAI Challenge on Micro-gesture Analysis for Hidden Emotion Understanding, 2nd Place in Micro-gesture Online Recognition Track. (2023年5月)
3. ACM MM Multi-modal Group Behaviour Analysis for Artificial Mediation, 1st Place in Bodily Behaviour Recognition Track. (2023年7月)
4. ACM MM Multi-modal Group Behaviour Analysis for Artificial Mediation, 1st Place in Eye Contact Detection Track. (2023年7月)
5. ACM MM Multi-modal Group Behaviour Analysis for Artificial Mediation, 3rd in Next Speaker Prediction Track. (2023年7月)
6. ACM MM Multi-modal Sentiment Analysis Challenge, 3rd in MuSe-Personalisation Track. (2023年7月)
7. IEEE International Conference on Multimedia and Expo (IEEE ICME多媒体旗舰国际大会)- Outstanding Reviewer Award
以人为本 科技向善
——记合肥工业大学计算机与信息学院教授郭丹
2025-08-28

这是一个人工智能(AI)蓬勃发展的时代。这是一个科技以前所未有的速度改变人类生活、生产方式的时代。AI技术正在重新定义人类与世界的互动方式。如何让这股强大的力量真正造福人类,是一众科研工作者思考的重要课题。合肥工业大学计算机与信息学院教授、视觉理解团队(VUT)负责人郭丹也是如此。在她眼中,科技向新更向善。多年来,她带领团队对接国家重大需求,在视听内容解析与视觉情感计算领域深耕细作。“以人为本”,是她的来时路,也是她未来前进的方向。
让沟通无障碍
“我们做技术研发不只是写写算法,拼拼性能参数,而是要落地。科技发展的最终目的是让人类的生活变得更好,如果能以‘科技向善’为导向,用科技更好地为特殊人群服务,是一件非常有意义的事。”2018年,郭丹偶然接触了手语识别,她认为完全可以从自己的研究角度出发,做一些拓展性工作。
为此,郭丹积极筹备起来。在国家自然科学基金的支持下,2019年,她主持的面上项目“动态长时手语视频自动翻译研究”启动;2021年,她参与的区域创新发展联合基金重点项目“手语视频分析与理解关键技术研究”也顺利立项。
听障人士进行手语表达往往以句子为单位,表达完一个完整的句子之后才会出现停顿,而过往以单个词为单位进行识别的算法,限制了翻译系统的灵活性和流畅度。郭丹团队提出的“基于数据自驱动的多阶特征动态融合手语识别方法”,解决了手语连续翻译的难题。从准确性上看,他们的成果在目前中国最大中文日常手语数据集(科大讯飞发布)上“已见句子”的测试精度已达到99.1%;而通过对生成的手语序列进行在线反向翻译验证,他们也解决了手语视频生成研究中文本语义薄弱的问题,有效提升了手语生成效果。基于此,郭丹团队的相关论文被引用累计千余次,成果被《国际电气电子工程师学会(IEEE)图像处理汇刊》、国际先进人工智能协会(口头汇报)等一系列国际权威期刊和会议收录,获得国际学术界广泛关注,也得到来自小米集团人工智能实验室、腾讯优图、英伟达等知名企业的认可,评价这一系列成果成功解决了手语识别的技术瓶颈。
“简单来说,就是当听障人士面向屏幕使用手语表达时,系统可以通过摄像头接收到这些手语动作并进行识别和翻译,翻译结果会以文字形式呈现在屏幕上;同样,它也能借助数字人的形式,将文字表达生成相应、准确的手语动作,让听障人士看到。”郭丹介绍。近年来,他们与中国科学技术大学、合肥综合性国家科学中心人工智能研究院等单位开展了手语识别与手语生成的一系列技术合作,成果在黑龙江公共法律服务热线、合肥市政务服务便民热线等平台的视频手语系统中得到应用,为听障人士带来实际的便利。
拓宽情感计算的应用边界
“刚来合肥工业大学时,我并不知道自己能有多少能量能做什么事,就想先做好自己,然后一步一步去进阶。”郭丹说。从最初一个人带三五个硕士生,到如今形成一个由20余名硕博研究生组成的青年团队,她带领视觉理解团队取得了一系列突破性成果。针对视频视听事件定位问题,团队提出的正样本传播网络,在降低90%网络参数量的同时保持了优异性能;在视听对象分割方向,团队创新设计的分割模型能精准识别视频中的发声物体;而基于关系图推理的视觉问答方法,更是被同行专家誉为“视觉理解和推理的新范式”。这个成果被《IEEE模式分析与机器智能汇刊》收录。与此同时,他们与商汤科技合作构建的全新视听场景推理数据集“视听分割基准”(AVSBench),也已向微软、英伟达,香港大学、西北工业大学等企业、高校、科研院所授予使用权限,反馈良好。
“我们前期就是做一些通识任务,当具备了足够的视觉处理能力、听觉处理能力之后,除了关注弱势人群的手语识别研究外,近年来我们开始重点关注智慧健康领域的心理精神赛道。”郭丹说。他们的研究重点集中在两方面:一是为普通人群提供心理快筛;二是针对上升到病理程度的抑郁症、多动症、认知障碍等,与专业机构结合开展深入研究。“医疗有门槛,更多普通人可能会存在心理精神上的情绪波动,但远远未到病理程度,这时如果有面向普通人群的、能装在手机上操作的简易App能辅助进行心理快筛、心理辅助和心理诊断,可能更适合他们。”她表示:“当心理精神问题上升到病理程度,应用场景可能就涉及医院、心理机构、学校、部队等不同专业机构。”
“早期的心理学研究侧重群体共性,而人工智能技术的发展为实现个性化服务提供了可能。”深耕数年,郭丹深刻感受到,在这样一个多学科交叉赛道上,如何用人工智能方法去进行更准确、更让人适应而不是更反感的评估,并不容易。而围绕多样化引导、干预和治疗,暂时还没有更成熟的技术来辅助。
“现在,人脸识别等方向的信息化处理能力已经非常成熟了,但精神层面的情感分析仍是一片蓝海。”因此,郭丹团队开展了以人为本的行为认知研究——微动作分析,旨在理解人类的非自主行为。团队构建了目前全球规模最大、类别最为丰富的微动作数据集——微动作(Micro-Action)52。这个数据集共包含22 422个微动作样本,覆盖7大类、52小类肢体动作,全面涵盖了全身各部位的微动作表现,通过精细化的标签体系,对微动作进行了有效建模,为相关领域的算法研究与应用开发提供了坚实的数据基础。同时,她以第一发起人身份成功组织举办了2024年国际多媒体会议主题挑战竞赛——微动作分析挑战赛。这是全球首个同时关注全身微动作识别与多标签微动作检测任务的学术挑战赛,吸引了全球40余支队伍的参与。郭丹希望能借此推动利用全身微动作进行人类行为理解、深度心理评估和人类情感状态分析技术的发展。而最近,她正忙于筹备第二届微动作竞赛,以及在美国计算机协会(ACM)/IEEE汇刊中推出心理精神应用特刊,进一步促进学术交流。
人工智能与心理健康的结合是未来重要方向,在人际交流和情绪状态分析等领域具有广泛的应用前景。目前,郭丹团队为合肥中聚源智能科技有限公司研发的“AI身心状态测评与干预系列产品”,投产后应用覆盖军工、教育、司法、纪监等多个行业,服务用户达20.8万人,成功筛查出心理异常超3000人次,已辅助完成多个案件侦破;为合肥中科加点智能科技有限公司开发的“基于边缘计算的智能视频分析与理解系统”已通过真实环境的调试与测试,节约计算能耗达60%;为合肥哈工图南智控机器人有限公司开发的“基于目标检测的移动机器人人体跟随系统”也已成功落地,直接提升经济效益35%以上。与此同时,结合多智能体人机交互实践,他们同科大讯飞等进行合作研究,相关成果也已应用在企业开发的交互式实训及客服系统中,并已在华东交通大学、长江大学、华南理工大学等20多所高校,以及华住、首旅如家、洲际等7000多家酒店上线使用,取得了显著的经济效益和社会效益。
郭丹教授表示,她和视觉理解团队的研究工作如果要凝练成一个核心,就是“以人为本的情感计算”。尽管多学科交叉下的研究充满挑战,但整个视觉理解团队将以创新之力,在人工智能与人文关怀的交汇处开辟出一条新的路径。“科技的真谛是向善,我们的每一步探索,都是为了做有温度的研究。”
专家简介
郭丹,合肥工业大学计算机与信息学院教授。2010年6月,毕业于华中科技大学系统分析与集成专业,获理学博士学位;同年进入合肥工业大学工作。任国际电气电子工程师学会(IEEE)、中国计算机学会(CCF)、中国图象图形学学会(CSIG)高级会员,中国图象图形学学会多媒体专委会(CSIG-MM)副秘书长,安徽省情感计算与先进智能机器重点实验室常务副主任等职。主持国家重点研发计划课题、国家自然科学基金面上项目、安徽省杰出青年项目、安徽省高端人才引育行动项目青年拔尖人才项目等多项课题。担任《IEEE多媒体汇刊》《美国计算机协会多媒体计算、通信和应用汇刊》《模式识别》《英国工程技术学会图像处理》《人工智能工程应用》《图像与视觉计算》等国际期刊副主编;发表国内外高水平期刊/学术会议论文100余篇,其中IEEE/ACM系列汇刊期刊论文和CCF-A类国际会议论文80余篇;获得CCF-A类国际会议挑战赛(ACM国际多媒体会议和国际人工智能联合会议等竞赛单元)的冠亚军13次;授权发明专利21项。
中国科技创新人物云平台暨“互联网+”科技创新人物开放共享平台(简称:中国科技创新人物云平台)免责声明:
1、中国科技创新人物云平台是:“互联网+科技创新人物”的大型云平台,平台主要发挥互联网在生产要素配置中的优化和集成作用,将互联网与科技创新人物的创新成果深度融合于经济社会各领域之中,提升实体经济的创新力和生产力,形成更广泛的以互联网为基础设施和实现工具的经济发展新形态,实现融合创新,为大众创业,万众创新提供智力支持,为产业智能化提供支撑,加快形成经济发展新动能,促进国民经济提质增效升级。
2、中国科技创新人物云平台暨“互联网+”科技创新人物开放共享平台内容来源于互联网,信息都是采用计算机手段与相关数据库信息自动匹配提取数据生成,并不意味着赞同其观点或证实其内容的真实性,如果发现信息存在错误或者偏差,欢迎随时与我们联系,以便进行更新完善。
3、如果您认为本词条还有待完善,请编辑词条。
4、如果发现中国科技创新人物云平台提供的内容有误或转载稿涉及版权等问题,请及时向本站反馈,网站编辑部邮箱:kjcxac@126.com。
5、中国科技创新人物云平台建设中尽最大努力保证数据的真实可靠,但由于一些信息难于确认不可避免产生错误。因此,平台信息仅供参考,对于使用平台信息而引起的任何争议,平台概不承担任何责任。

