Wanhua Li
I am currently a postdoctoral fellow at Harvard University supervised by Prof. Hanspeter Pfister.
Prior to that, I received my Ph.D. from the Department of Automation at Tsinghua University in 2022, advised by Prof. Jiwen Lu,
Prof. Jianjiang Feng , and Prof. Jie Zhou.
In 2017, I received my B.S. degree in computer science at Sun Yat-sen University, Guangzhou, China.
My research interests mainly include vision-language models, neural rendering, and 3D-aware synthesis.
I am on the academic job market! Please reach out if you're interested in my research and think I would be a good fit for your department!
Email: wanhua [AT] seas [DOT] harvard [DOT] edu
CV  / 
Google Scholar  / 
Twitter  / 
GitHub
|
|
News
2025-03: I gave an invited talk on Vision Foundation Models at Princeton University.
2025-02: One paper on 4D LangSplat is accepted by CVPR 2025.
2025-02: One paper on video inpainting is accepted by Pattern Recognition (PR).
2025-02: Our SD-LoRA paper is selected as an ICLR oral paper.
2025-01: Two papers on Vision Language Models Prompting and Tuning are accepted by ICLR 2025.
2024-09: One paper on Vision Language Models Prompting is accepted by NeurIPS 2024.
2024-07: One paper on Video Temporal Grounding is accepted by ECCV 2024.
2024-06: One paper on IVF is accepted by MICCAI 2024.
2024-05: Congratulations to Karly Hou. Her undergraduate thesis supervised by me won Harvard's Hoopes Prize!
2024-05: One paper on multimodal learning is early accepted (top 11%) by MICCAI 2024.
2024-05: One paper on connectomics is accepted by TMI.
2024-04: One paper on Deepfake detection is accepted by Pattern Recognition (PR).
2024-04: Our LangSplat paper is selected as a CVPR Highlight paper.
2024-02: Two papers on 3D Gaussian splatting and multi-task learning are accepted by CVPR 2024.
|
Recent Selected Publications [ Full List ]
(*Equal Contribution, #Corresponding Author)
|
|
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li*, Renping Zhou*, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[Website]
[arxiv]
[Video]
[Code]
We present 4D LangSplat, an approach to constructing a dynamic 4D language field in evolving scenes, leveraging Multimodal Large Language Models.
|
|
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Wanhua Li*, Zibin Meng*, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister
Conference on Neural Information Processing Systems (NeurIPS), 2024
[Website]
[arxiv]
[Video]
[Code]
We present SocialGPT, a modular framework with greedy segment prompt optimization for social relation reasoning, which attains competitive results while also providing interpretable explanations.
|
|
LangSplat: 3D Language Gaussian Splatting
Minghan Qin*, Wanhua Li*#, Jiawei Zhou*, Haoqian Wang#, Hanspeter Pfister
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight)
[Website]
[arxiv]
[Video]
[Code]
We ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF.
|
|
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, and Donglai Wei
IEEE International Conference on Computer Vision (ICCV), 2023
[Website]
[arxiv]
[Code]
[Video]
To facilitate using pre-trained models in MMT, we propose CLIPTrans, which transfers the multimodal representations of M-CLIP into a multilingual mBART.
|
|
CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering
Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie Zhou, and Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2023
[Website]
[arxiv]
[Code]
[Video]
We propose an attribute hallucination framework named CLIP-Cluster to narrow the intraclass variance caused by different face attributes for face clustering.
|
|
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jie Zhou, and Jiwen Lu
Conference on Neural Information Processing Systems (NeurIPS), 2022
[Website]
[arxiv]
[Code]
[中文解读]
We propose a language-powered paradigm for ordinal regression, which learns the rank concepts from the rich semantic CLIP latent space.
|
|
Label2Label: A Language Modeling Framework for Multi-Attribute Learning
Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie Zhou, and Jiwen Lu
European Conference on Computer Vision (ECCV), 2022
[Website]
[arxiv]
[Video]
[Code]
We propose a language modeling framework named Label2Label to model the complex instance-wise attribute relations,
which regards each attribute label as a “word” and recovers the label “sentence” based on the masked one.
|
|
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis
Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, and Jiwen Lu
European Conference on Computer Vision (ECCV), 2022
[Website]
[arxiv]
[Video]
[Code]
We propose dynamic facial radiance fields conditioned on the 3D aware reference image features.
The facial field can rapidly generalize to novel identities with only 15s clip.
|
|
Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, and Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2021
[Paper]
[bibtex]
We propose a Frequency-Aware Spatiotemporal Transformer for video inpainting detection, which simultaneously mines the traces of video inpainting from spatial, temporal, and frequency domains.
|
|
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
Wanhua Li, Xiaoke Huang, Jiwen Lu, Jianjiang Feng, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Website]
[arxiv]
[Video]
[Code]
We propose probabilistic ordinal embeddings to empower the present-day regression methods with the ability of uncertainty estimation.
|
|
Meta-Mining Discriminative Samples for Kinship Verification
Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Website]
[arxiv]
[Video]
[bibtex]
A Discriminative Sample Meta-Mining strategy is proposed to mine discriminative information from limited positive pairs and sufficient negative samples for kinship verification.
|
|
Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes
Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Website]
[arxiv]
[Code]
[Video]
It is the first face clustering method to train on very large-scale graph with 20M nodes, and achieve superior inference results on 12M testing data.
|
|
Graph-Based Social Relation Reasoning
Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, and Jie Zhou
European Conference on Computer Vision (ECCV), 2020
[Website]
[arxiv]
[Video]
[Code]
A simpler, faster, and more accurate method for social relation recognition.
|
|
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie Zhou, Qi Tian
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
[arXiv]
[PDF]
[bibtex]
We propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively.
|
Honors and Awards
NeurIPS Scholar Award, 2022.
ICCV Doctoral Consortium Travel Award, 2021.
Weihai Talent Scholarship, Tsinghua, 2021.
3rd Place in 2021 VIPriors Instance Segmentation Challenge @ICCV 2021.
Outstanding Oral Presentation at Beijing University Academic Forum on Artificial Intelligence, 2021
2nd Place in ChaLearn LAP Large-scale Isolated Gesture Recognition Challenge @ICCV 2017.
Outstanding Undergraduate Thesis, SYSU, 2017.
Outstanding Graduate, SYSU, 2017.
National Encouragement Scholarship, Ministry of Education of P.R. China, 2016.
National Scholarship, Ministry of Education of P.R. China, 2015.
National Scholarship, Ministry of Education of P.R. China, 2014.
|
Professional Activities
Reviewer, IEEE Transactions on Pattern Analysis and Machine Intelligence.
Reviewer, IEEE Transactions on Image Processing.
Reviewer, IEEE Transactions on Neural Networks and Learning Systems.
Reviewer, IEEE Transactions on Circuits and Systems for Video Technology.
Reviewer, IEEE Transactions on Biometrics, Behavior, and Identity Science.
Reviewer, IEEE Transactions on Artificial Intelligence.
Reviewer, IEEE Transactions on Affective Computing.
Reviewer, IEEE Transactions on Cybernetics.
Reviewer, IEEE Transactions on Multimedia.
Reviewer, IEEE Signal Processing Letters.
Reviewer, International Journal of Computer Vision.
Reviewer, Pattern Recognition.
Reviewer, Neural Networks.
Reviewer, Neurocomputing.
Reviewer, Pattern Recognition Letters.
Reviewer, Journal of Visual Communication and Image Representation.
Reviewer, Knowledge-Based Systems.
Reviewer, Frontiers of Computer Science.
Reviewer, SIGGRAPH 2024.
Reviewer, International Conference on Computer Vision (ICCV), 2021-2023.
Reviewer, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022-2024.
Reviewer, European Conference on Computer Vision (ECCV), 2022-2024.
Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2023-2024.
PC member, AAAI Conference on Artificial Intelligence (AAAI), 2022-2024.
PC member, International Joint Conference on Artificial Intelligence (IJCAI), 2022-2023.
Reviewer, IEEE International Conference on Multimedia and Expo (ICME), 2019-2023.
Reviewer, IEEE International Conference on Image Processing (ICIP), 2018-2023.
Reviewer, International Conference on Pattern Recognition (ICPR), 2018-2022.
Reviewer, Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2021-2023.
Reviewer, IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023-2024.
|
|