Wanhua Li

I am an incoming Nanyang Assistant Professor at Nanyang Technological University (NTU Singapore) in Spring 2026. Currently, I am a postdoctoral fellow at Harvard University supervised by Prof. Hanspeter Pfister. Prior to that, I received my Ph.D. from the Department of Automation at Tsinghua University in 2022, advised by Prof. Jiwen Lu, Prof. Jianjiang Feng , and Prof. Jie Zhou. In 2017, I received my B.S. degree in computer science at Sun Yat-sen University, Guangzhou, China.

My research interests mainly include Spatial Intelligence, Neural Rendering, and Physical Intelligence. My long-term goal is to build visually intelligent systems that can perceive, reason about, and interact with the real world.

I will be recruiting PhD students in the December 2025 application cycle! I will also be looking for motivated postdocs, visiting students, and research interns to join my group! See Join Us for more details.

Email: wanhua016 [AT] gmail [DOT] com

CV / Google Scholar / Twitter / LinkedIn

News

2025-09: One paper on LangSplatV2 is accepted by NeurIPS.

2025-07: One paper on AI for biodiversity research is accepted by Methods in Ecology and Evolution.

2025-06: Two papers on 3D reconstruction and compositional 3D generation are accepted by ICCV 2025.

2025-05: One paper on blood vessel segmentation is accepted by JBHI.

2025-03: I gave an invited talk on Vision Foundation Models at Princeton University.

2025-02: One paper on 4D LangSplat is accepted by CVPR 2025.

2025-02: One paper on video inpainting is accepted by Pattern Recognition (PR).

2025-02: Our SD-LoRA paper is selected as an ICLR oral paper.

2025-01: Two papers on Vision Language Models Prompting and Tuning are accepted by ICLR 2025.

2024-09: One paper on Vision Language Models Prompting is accepted by NeurIPS 2024.

2024-07: One paper on Video Temporal Grounding is accepted by ECCV 2024.

2024-06: One paper on IVF is accepted by MICCAI 2024.

2024-05: Congratulations to Karly Hou. Her undergraduate thesis supervised by me won Harvard's Hoopes Prize!

2024-05: One paper on multimodal learning is early accepted (top 11%) by MICCAI 2024.

2024-05: One paper on connectomics is accepted by TMI.

2024-04: One paper on Deepfake detection is accepted by Pattern Recognition (PR).

2024-04: Our LangSplat paper is selected as a CVPR Highlight paper.

2024-02: Two papers on 3D Gaussian splatting and multi-task learning are accepted by CVPR 2024.

Recent Selected Publications [ Full List ]

(*Equal Contribution, #Corresponding Author)

	LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister Conference on Neural Information Processing Systems (NeurIPS)*, 2025 [Website] [arxiv] [Video] [Code] We present LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images.
	Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction Jixuan Fan, Wanhua Li, Yifei Han, Tianru Dai, Yansong Tang International Conference on Computer Vision (ICCV), 2025 [Website] [arxiv] [Video] [Code] We propose Momentum-GS, a momentum-based self-distillation framework that significantly improves 3D Gaussian Splatting for large-scale scene reconstruction.
	4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [Website] [arxiv] [Video] [Code] We present 4D LangSplat, an approach to constructing a dynamic 4D language field in evolving scenes, leveraging Multimodal Large Language Models.
	SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister Conference on Neural Information Processing Systems (NeurIPS), 2024 [Website] [arxiv] [Video] [Code] We present SocialGPT, a modular framework with greedy segment prompt optimization for social relation reasoning, which attains competitive results while also providing interpretable explanations.
	LangSplat: 3D Language Gaussian Splatting Minghan Qin, Wanhua Li#*, Jiawei Zhou, Haoqian Wang#, Hanspeter Pfister IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight) [Website] [arxiv] [Video] [Code] We ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF.
	CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, and Donglai Wei IEEE International Conference on Computer Vision (ICCV), 2023 [Website] [arxiv] [Code] [Video] To facilitate using pre-trained models in MMT, we propose CLIPTrans, which transfers the multimodal representations of M-CLIP into a multilingual mBART.
	CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie Zhou, and Jiwen Lu IEEE International Conference on Computer Vision (ICCV), 2023 [Website] [arxiv] [Code] [Video] We propose an attribute hallucination framework named CLIP-Cluster to narrow the intraclass variance caused by different face attributes for face clustering.
	OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression Wanhua Li, Xiaoke Huang, Zheng Zhu, Yansong Tang, Xiu Li, Jie Zhou, and Jiwen Lu Conference on Neural Information Processing Systems (NeurIPS), 2022 [Website] [arxiv] [Code] [中文解读] We propose a language-powered paradigm for ordinal regression, which learns the rank concepts from the rich semantic CLIP latent space.
	Label2Label: A Language Modeling Framework for Multi-Attribute Learning Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie Zhou, and Jiwen Lu European Conference on Computer Vision (ECCV), 2022 [Website] [arxiv] [Video] [Code] We propose a language modeling framework named Label2Label to model the complex instance-wise attribute relations, which regards each attribute label as a “word” and recovers the label “sentence” based on the masked one.
	Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, and Jiwen Lu European Conference on Computer Vision (ECCV), 2022 [Website] [arxiv] [Video] [Code] We propose dynamic facial radiance fields conditioned on the 3D aware reference image features. The facial field can rapidly generalize to novel identities with only 15s clip.
	Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, and Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 [Paper] [bibtex] We propose a Frequency-Aware Spatiotemporal Transformer for video inpainting detection, which simultaneously mines the traces of video inpainting from spatial, temporal, and frequency domains.
	Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression Wanhua Li, Xiaoke Huang, Jiwen Lu, Jianjiang Feng, and Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [Website] [arxiv] [Video] [Code] We propose probabilistic ordinal embeddings to empower the present-day regression methods with the ability of uncertainty estimation.
	Meta-Mining Discriminative Samples for Kinship Verification Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, and Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [Website] [arxiv] [Video] [bibtex] A Discriminative Sample Meta-Mining strategy is proposed to mine discriminative information from limited positive pairs and sufficient negative samples for kinship verification.
	Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, and Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [Website] [arxiv] [Code] [Video] It is the first face clustering method to train on very large-scale graph with 20M nodes, and achieve superior inference results on 12M testing data.
	Graph-Based Social Relation Reasoning Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, and Jie Zhou European Conference on Computer Vision (ECCV), 2020 [Website] [arxiv] [Video] [Code] A simpler, faster, and more accurate method for social relation recognition.
	BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie Zhou, Qi Tian IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019 [arXiv] [PDF] [bibtex] We propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively.

Honors and Awards

NeurIPS Scholar Award, 2022.

ICCV Doctoral Consortium Travel Award, 2021.

Weihai Talent Scholarship, Tsinghua, 2021.

3rd Place in 2021 VIPriors Instance Segmentation Challenge @ICCV 2021.

Outstanding Oral Presentation at Beijing University Academic Forum on Artificial Intelligence, 2021

2nd Place in ChaLearn LAP Large-scale Isolated Gesture Recognition Challenge @ICCV 2017.

Outstanding Undergraduate Thesis, SYSU, 2017.

Outstanding Graduate, SYSU, 2017.

National Encouragement Scholarship, Ministry of Education of P.R. China, 2016.

National Scholarship, Ministry of Education of P.R. China, 2015.

National Scholarship, Ministry of Education of P.R. China, 2014.

Professional Activities

Reviewer, IEEE Transactions on Pattern Analysis and Machine Intelligence.

Reviewer, IEEE Transactions on Image Processing.

Reviewer, IEEE Transactions on Neural Networks and Learning Systems.

Reviewer, IEEE Transactions on Circuits and Systems for Video Technology.

Reviewer, IEEE Transactions on Biometrics, Behavior, and Identity Science.

Reviewer, IEEE Transactions on Artificial Intelligence.

Reviewer, IEEE Transactions on Affective Computing.

Reviewer, IEEE Transactions on Cybernetics.

Reviewer, IEEE Transactions on Multimedia.

Reviewer, IEEE Signal Processing Letters.

Reviewer, International Journal of Computer Vision.

Reviewer, Pattern Recognition.

Reviewer, Neural Networks.

Reviewer, Neurocomputing.

Reviewer, Pattern Recognition Letters.

Reviewer, Journal of Visual Communication and Image Representation.

Reviewer, Knowledge-Based Systems.

Reviewer, Frontiers of Computer Science.

Reviewer, SIGGRAPH 2024.

Reviewer, International Conference on Computer Vision (ICCV), 2021-2023.

Reviewer, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022-2024.

Reviewer, European Conference on Computer Vision (ECCV), 2022-2024.

Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2023-2024.

PC member, AAAI Conference on Artificial Intelligence (AAAI), 2022-2024.

PC member, International Joint Conference on Artificial Intelligence (IJCAI), 2022-2023.

Reviewer, IEEE International Conference on Multimedia and Expo (ICME), 2019-2023.

Reviewer, IEEE International Conference on Image Processing (ICIP), 2018-2023.

Reviewer, International Conference on Pattern Recognition (ICPR), 2018-2022.

Reviewer, Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2021-2023.

Reviewer, IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023-2024.

Website Template