Sixun Dong (Ironieser)
Multimodal Learning, VLM, LLM Agent
Independent Researcher, AZ, USA
Currently, I am an independent researcher. I completed my Master's at ShanghaiTech University under Professor Shenghua Gao.
My research focuses on multimodal AI systems that bridge computer vision, natural language processing, and machine learning, with different applications.
Recent News
Selected Publications
ICASSP'26 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER
Experience
GenAI Research Intern
Zoom Inc., GenAI Research Group
May 2025 - Aug 2025
Worked on VLM and LLM Agent. Published one first-author paper on efficient VLM inference and two collaborative papers on LLM evaluation.
Research Intern (Team Leader)
DGene, Digital Human Algorithm Department
Aug 2023 - Jan 2024
Led digital human projects: co-speech gesture generation and 3D human body reconstruction with <7% measurement error.
Research Intern (Team Leader)
Transsion Holdings, Audio-Video Generation Department
Apr 2023 - Aug 2023
Led audio-driven talking head video generation research, achieving SoTA performance in commercial and academic benchmarks.
Academic Service
Reviewer
Conferences: CVPR (2023β2026), ICCV (2023, 2025), ECCV (2024, 2026), NeurIPS (2025), ICML (2025, 2026),ICLR (2026), ACM MM (2023β2025), ACCV (2024), KDD (2025)
Journals: IEEE Transactions on Multimedia, Neural Networks(Elsevier), ACM Transactions on Knowledge Discovery from Data
Education
M.S. in Computer Science
ShanghaiTech University, China
SVIP-Lab, Advisor: Prof. Shenghua Gao
B.E. in Computer Science (Dual Degree)
Dalian University of Technology, China
B.E. in Process Equipment and Control Engineering
Dalian University of Technology, China