Sixun Dong

Sixun Dong (Ironieser)

Multimodal Learning, VLM, LLM Agent

Independent Researcher, AZ, USA

Currently, I am an independent researcher. I completed my Master's at ShanghaiTech University under Professor Shenghua Gao.

My research focuses on multimodal AI systems that bridge computer vision, natural language processing, and machine learning, with different applications.

Recent News

Sep 2025 Paper on Feature Transformation by Semi-AR and reward-guided diffusion, accepted to NeurIPS 2025, congratulations to Nanxu Gong and Zijun Li!
Aug 2025 Completed GenAI Research Internship at Zoom Inc., focusing on efficient vision–language modeling. Grateful to my mentor Qi Qian and the Zoom team for their invaluable support and collaboration.
Aug 2025 Paper on MMTok - Multimodal Coverage Maximization for Efficient VLM Inference, project homepage launched
Aug 2025 Paper on LiveMCP-101 - a new benchmark testing AI agents’ real-world tool-use, released on arXiv
Aug 2025 Paper on LogicIF - Complex Logical Instruction Generation released on arXiv
Aug 2025 Published comprehensive blog post about TimesCLIP - our multimodal approach to time series forecasting with CLIP
May 2025 Started GenAI Research Internship at Zoom Inc. focusing on efficient vision-language modeling
Feb 2024 Paper on MLLM-Tool accepted to WACV 2024
Mar 2023 Paper on WeakSVR accepted to CVPR 2023
Mar 2022 Paper on TransRAC accepted as 🏆 oral presentation to CVPR 2022

Selected Publications

MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

arXiv'2508 MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

Sixun Dong, Juhua Hu, Mian Zhang, Ming Yin, Yanjie Fu, Qi Qian

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

arXiv'2508 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Ming Yin, Dinghan Shen, Silei Xu, Jianbing Han, Sixun Dong, Mian Zhang, Yebowen Hu, Shujian Liu, Simin Ma, Song Wang, Sathish Reddy Indurthi, Xun Wang, Yiran Chen, Kaiqiang Song

Complex Logical Instruction Generation

arXiv'2508 Complex Logical Instruction Generation

Mian Zhang, Shujian Liu, Sixun Dong, Ming Yin, Yebowen Hu, Xun Wang, Steven Ma, Song Wang, Sathish Reddy Indurthi, Haoyun Deng, Zhiyu Zoey Chen, Kaiqiang Song

Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

arXiv'2506 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

Sixun Dong, Wei Fan, Teresa Wu, Yanjie Fu

Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

NeurIPS'25 Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

Nanxu Gong*, Zijun Li*, Sixun Dong, Haoyue Bai, Wangyang Ying, Xinyuan Wang, Yanjie Fu

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

WACV'25 MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Chenyu Wang, Weixin Luo, Sixun Dong, Xiaohua Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

3DV'24 RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

Yiqun Zhao, Zibo Zhao, Jing Li, Sixun Dong, Shenghua Gao

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

CVPR'23 Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

Sixun Dong*, Huazhang Hu*, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao

Experience

GenAI Research Intern

Zoom Inc., GenAI Research Group

May 2025 - Aug 2025

Worked on VLM and LLM Agent. Published one first-author paper on efficient VLM inference and two collaborative papers on LLM evaluation.

Research Intern (Team Leader)

DGene, Digital Human Algorithm Department

Aug 2023 - Jan 2024

Led digital human projects: co-speech gesture generation and 3D human body reconstruction with <7% measurement error.

Research Intern (Team Leader)

Transsion Holdings, Audio-Video Generation Department

Apr 2023 - Aug 2023

Led audio-driven talking head video generation research, achieving SoTA performance in commercial and academic benchmarks.

Academic Service

Reviewer

Conferences: CVPR 2023+, ICCV 2023+, ECCV 2024+, NeurIPS 2025+, ICLR 2026+, ACCV 2024, ACM MM 2023-2025, KDD 2024

Journals: TMM, Neural Networks, TKDD

Education

2021 - 2024

M.S. in Computer Science

ShanghaiTech University, China

SVIP-Lab, Advisor: Prof. Shenghua Gao

2016 - 2020

B.E. in Computer Science (Dual Degree)

Dalian University of Technology, China

2016 - 2020

B.E. in Process Equipment and Control Engineering

Dalian University of Technology, China