Sixun Dong (Ironieser)

Multimodal Learning, VLM, LLM Agent

Independent Researcher, AZ, USA

Currently, I am an independent researcher. I completed my Master's at ShanghaiTech University under Professor Shenghua Gao.

My research focuses on multimodal AI systems that bridge computer vision, natural language processing, and machine learning, with different applications.

Email Scholar GitHub Twitter LinkedIn 知乎

Recent News

Feb 2026 📄 Paper on To Think or Not To Think — Large Reasoning Models in Theory of Mind Tasks, released on arXiv

Jan 2026 🎉 Paper on MMTok accepted to ICLR 2026! Thanks to my mentor Qi Qian and all coauthors.

Jan 2026 🎉 Robust Dysarthric Speech Recognition accepted to ICASSP 2026. Congratulations to Xiuwen Zheng!

Sep 2025 🎉 Sculpting Features from Noise accepted to NeurIPS 2025. Congratulations to Nanxu Gong and Zijun Li!

Aug 2025 💼 Completed GenAI Research Internship at Zoom Inc., focusing on efficient vision–language modeling. Grateful to my mentor Qi Qian and the Zoom team.

Aug 2025 📄 Paper on LiveMCP-101 — a new benchmark testing AI agents' real-world tool-use, released on arXiv

Aug 2025 📄 Paper on LogicIF — Complex Logical Instruction Generation released on arXiv

Aug 2025 📄 Paper on TimesCLIP — new multimodal approach to time series forecasting with CLIP, released on arXiv

May 2025 💼 Started GenAI Research Internship at Zoom Inc. focusing on efficient vision-language modeling

Jan 2024 💼 Completed Team Leader internship at DGene (Digital Human Algorithm Dept.), leading co-speech gesture generation and 3D human body reconstruction projects.

Feb 2024 🎉 Paper on MLLM-Tool accepted to WACV 2024

Aug 2023 💼 Completed Team Leader internship at Transsion Holdings (Audio-Video Generation Dept.), leading audio-driven talking-head video generation research with SoTA performance.

Mar 2023 🎉 Paper on WeakSVR accepted to CVPR 2023

Mar 2022 🎉 Paper on TransRAC accepted as 🏆 oral presentation to CVPR 2022

Selected Publications

ICLR'26 MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

Sixun Dong, Juhua Hu, Mian Zhang, Ming Yin, Yanjie Fu, Qi Qian

Paper / Code / Homepage / Blog / 知乎

arXiv'2506 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

Sixun Dong, Wei Fan, Teresa Wu, Yanjie Fu

Paper / Code / Homepage / Blog / 知乎

NeurIPS'25 Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

Nanxu Gong*, Zijun Li*, Sixun Dong, Haoyue Bai, Wangyang Ying, Xinyuan Wang, Yanjie Fu

Paper / Code / 知乎

IJCAI'25 Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming

Nanxu Gong, Xinyuan Wang, Wangyang Ying, Haoyue Bai, Sixun Dong, Haifeng Chen, Yanjie Fu

Paper / Code

WACV'25 MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Chenyu Wang, Weixin Luo, Sixun Dong, Xiaohua Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

Paper / Code

3DV'24 RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

Yiqun Zhao, Zibo Zhao, Jing Li, Sixun Dong, Shenghua Gao

Paper / Code

CVPR'23 Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

Sixun Dong*, Huazhang Hu*, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao

Paper / Code / YouTube / Bilibili / 知乎

View All Publications

Experience

GenAI Research Intern

Zoom Inc., GenAI Research Group

May 2025 - Aug 2025

Worked on VLM and LLM Agent. Published one first-author paper on efficient VLM inference and two collaborative papers on LLM evaluation.

Research Intern (Team Leader)

DGene, Digital Human Algorithm Department

Aug 2023 - Jan 2024

Led digital human projects: co-speech gesture generation and 3D human body reconstruction with <7% measurement error.

Research Intern (Team Leader)

Transsion Holdings, Audio-Video Generation Department

Apr 2023 - Aug 2023

Led audio-driven talking head video generation research, achieving SoTA performance in commercial and academic benchmarks.

Academic Service

Reviewer

Conferences: CVPR (2023–2026), ICCV (2023, 2025), ECCV (2024, 2026), NeurIPS (2025), ICML (2025, 2026),ICLR (2026), ACM MM (2023–2025), ACCV (2024), KDD (2025)

Journals: IEEE Transactions on Multimedia, Neural Networks(Elsevier), ACM Transactions on Knowledge Discovery from Data

Education

2021 - 2024

M.S. in Computer Science

ShanghaiTech University, China

SVIP-Lab, Advisor: Prof. Shenghua Gao

2016 - 2020

B.E. in Computer Science (Dual Degree)

Dalian University of Technology, China

2016 - 2020

B.E. in Process Equipment and Control Engineering

Dalian University of Technology, China