Duo Zheng

Duo Zheng (郑铎)

I am a third-year Ph.D. student at LaVi Lab, The Chinese University of Hong Kong. I am fortunate to be advised by Prof. Liwei Wang. Before that, I received B.Eng. and M.S from Beijing University of Posts and Telecommunications (2016-2023), advised by Prof. Xiaojie Wang.

I am interested in Vision-and-Language, Embodied AI and LLMs.

Email / Scholar / Github

Publications

	Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Duo Zheng, Shijia Huang, Yanyang Li, Liwei Wang NeurIPS 2025. Code / Paper We present a novel framework to enhance MLLMs’ 3D spatial understanding capability, which incorporates a 3D visual geometry encoder to provide latent 3D geometric information given only video inputs.
	Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Duo Zheng, Shijia Huang, Liwei Wang CVPR, 2025. Code / Paper This paper proposes a Video-based LLM for 3D scene understanding, which is built upon a Video LLM and incorporates 3D coordinates into video representations.
	Towards Learning a Generalist Model for Embodied Navigation Duo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang CVPR, 2024. (Poster highlight, Top 2.8%) Code / Paper This paper proposes the first generalist model for embodied navigation, NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based instruction.
	CLEVA: Chinese Language Models EVAluation Platform Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, Liwei Wang EMNLP, 2023, System Demonstrations Project / Paper CLEVA provides a comprehensive benchmark to holistically evaluate Chinese LLMs.
	Towards Unifying Reference Expression Generation and Comprehension Duo Zheng, Tao Kong, Ya Jing, Jiaan Wang, Xiaojie Wang EMNLP, 2022, Long Paper Code / Paper This paper proposes a unified model for reference expression generation and comprehension.
	Visual Dialog for Spotting the Differences between Pairs of Similar Images Duo Zheng, Fandong Meng, Qingyi Si, Hairun Fan, Zipeng Xu, Jie Zhou, Fangxiang Feng, Xiaojie Wang ACM MM, 2022 Code / Paper We propose a cooperative object-referring game Dial-the-Diff, where the goal is to locate the different object between two similar images via conversing between questioner and answerer.
	Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser Duo Zheng, Zipeng Xu, Fandong Meng, Xiaojie Wang, Jiaan Wang, Jie Zhou Findings of EMNLP, 2021, Long Paper Code / Paper In this paper, we propose Related entity enhanced Questioner (ReeQ) and Augmented Guesser (AugG) to enhance Visual Dialog Questioner in both SL and RL.

Internships

ByteDance Research, Beijing, China. Jan 2022 - June 2022.
Research Internship, focusing on Visual Grouding.
Mentor: Tao Kong
WeChat AI, Tencent Inc., Beijing, China. Sept 2020 - Jan 2020.
Research Internship, focusing on Visual Dialog.
Mentor: Fandong Meng

Selected Honors

Postgraduate Scholarship, The Chinese University of Hong Kong. 2023 - Present
China National Scholarship. Ministry of Education of P.R. China. 2022.
CCF Elite Collegiate Student Award. China Computer Federation. 2020.
Gold medal, The 5th China Collegiate Programming Contest (Qinhuangdao Site). 2019.
The First‑grade Award, Chinese High School Mathematics League. 2015.

Service

Reviewer: CVPR, ICCV, ECCV, ICML, NeurIPS, ACL, ACM MM, IJCV
Teaching: AIST 1000, CSCI 3320, AIST 3120

Design and source code from Jon Barron's website