Sixun Dong

Sixun Dong (Ironieser)

PhD Student in Computer Science

Arizona State University

I am studying at Arizona State University. I completed my Master's at ShanghaiTech University under Professor Shenghua Gao.

My research focuses on multimodal AI systems that bridge computer vision, natural language processing, and machine learning, with different applications.

Recent News

Aug 2025 Completed GenAI Research Internship at Zoom Inc., focusing on efficient vision–language modeling. Grateful to my mentor Qi Qian and the Zoom team for their invaluable support and collaboration.
Aug 2025 Paper on MMTok - Multimodal Coverage Maximization for Efficient VLM Inference, project homepage launched
Aug 2025 Paper on LiveMCP-101 - a new benchmark testing AI agents’ real-world tool-use, released on arXiv
Aug 2025 Paper on LogicIF - Complex Logical Instruction Generation released on arXiv
Aug 2025 Published comprehensive blog post about TimesCLIP - our multimodal approach to time series forecasting with CLIP
May 2025 Started GenAI Research Internship at Zoom Inc. focusing on efficient vision-language modeling
Aug 2024 Started PhD program at Arizona State University
Feb 2024 Paper on MLLM-Tool accepted to WACV 2024
Mar 2023 Paper on WeakSVR accepted to CVPR 2023
Mar 2022 Paper on TransRAC accepted as 🏆 oral presentation to CVPR 2022

Selected Publications

MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

arXiv'2508 MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

Sixun Dong, Juhua Hu, Mian Zhang, Ming Yin, Yanjie Fu, Qi Qian

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

arXiv'2508 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Ming Yin, Dinghan Shen, Silei Xu, Jianbing Han, Sixun Dong, Mian Zhang, Yebowen Hu, Shujian Liu, Simin Ma, Song Wang, Sathish Reddy Indurthi, Xun Wang, Yiran Chen, Kaiqiang Song

Complex Logical Instruction Generation

arXiv'2508 Complex Logical Instruction Generation

Mian Zhang, Shujian Liu, Sixun Dong, Ming Yin, Yebowen Hu, Xun Wang, Steven Ma, Song Wang, Sathish Reddy Indurthi, Haoyun Deng, Zhiyu Zoey Chen, Kaiqiang Song

Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

arXiv'2506 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

Sixun Dong, Wei Fan, Teresa Wu, Yanjie Fu

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

WACV MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Chenyu Wang, Weixin Luo, Sixun Dong, Xiaohua Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

CVPR Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

Sixun Dong*, Huazhang Hu*, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

CVPR🏆 Oral TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Huazhang Hu*, Sixun Dong*, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao

Experience

GenAI Research Intern

Zoom Inc., GenAI Research Group

May 2025 - Present

Working on VLM and LLM Agent.

Research Intern (Team Leader)

DGene, Digital Human Algorithm Department

Aug 2023 - Jan 2024

Led digital human projects: co-speech gesture generation and 3D human body reconstruction with <7% measurement error.

Research Intern (Team Leader)

Transsion Holdings, Audio-Video Generation Department

Apr 2023 - Aug 2023

Led audio-driven talking head video generation research, achieving SoTA performance in commercial and academic benchmarks.

Academic Service

Reviewer

Conferences: CVPR 2023-2025, ICCV 2023-2025, NeurIPS 2025, ECCV 2024, ACCV 2024, ACM MM 2023-2025, KDD 2024

Journals: TMM, Neural Networks, TKDD

Education

2024 - Present

Computer Science

Arizona State University, USA

Focus: Multimodal Learning, Computer Vision, LLM Agent

2021 - 2024

M.S. in Computer Science

ShanghaiTech University, China

SVIP-Lab, Advisor: Prof. Shenghua Gao

2016 - 2020

B.E. in Computer Science (Dual Degree)

Dalian University of Technology, China

2016 - 2020

B.E. in Process Equipment and Control Engineering

Dalian University of Technology, China