Wanhua Li
I am currently a postdoctoral fellow at Harvard University supervised by Prof. Hanspeter Pfister.
Prior to that, I received my Ph.D. from the Department of Automation at Tsinghua University in 2022, advised by Prof. Jiwen Lu,
Prof. Jianjiang Feng , and Prof. Jie Zhou.
In 2017, I received my B.S. degree in computer science at Sun Yat-sen University, Guangzhou, China.
My research interests mainly include vision-language models, neural rendering, and 3D-aware synthesis.
If you are interested in my research or would like to work with me as an intern at Harvard University, feel free to contact me. Remote collaboration is also welcome!
Email: wanhua [AT] seas [DOT] harvard [DOT] edu
CV  / 
Google Scholar  / 
Twitter  / 
GitHub
|
|
News
2024-09: One paper on Vision Language Models Prompting is accepted by NeurIPS 2024.
2024-07: One paper on Video Temporal Grounding is accepted by ECCV 2024.
2024-06: One paper on IVF is accepted by MICCAI 2024.
2024-05: Congratulations to Karly Hou. Her undergraduate thesis supervised by me won Harvard's Hoopes Prize!
2024-05: One paper on multimodal learning is early accepted (top 11%) by MICCAI 2024.
2024-05: One paper on connectomics is accepted by TMI.
2024-04: One paper on Deepfake detection is accepted by Pattern Recognition (PR).
2024-04: Our LangSplat paper is selected as a CVPR Highlight paper.
2024-02: Two papers on 3D Gaussian splatting and multi-task learning are accepted by CVPR 2024.
2023-08: One paper on talking head synthesis is accepted by TMM.
2023-07: One paper on face clustering is accepted by T-PAMI.
2023-07: Two papers on face clustering and foundation models are accepted by ICCV 2023.
2023-06: One paper on deepfake detection is accepted by TIP.
2023-02: One paper on talking head synthesis is accepted to CVPR 2023.
2022-10: I joined Harvard as a postdoc!
2022-09: One paper on language-guided ordinal regression is accepted by NeurIPS 2022.
2022-07: Two papers on multi-attribute learning and talking head synthesis are accepted by ECCV 2022.
2022-06: One paper on age estimation is accepted by TIP.
2022-04: One paper on image inpainting is accepted by TMM.
2021-10: Our team won the 3rd place in 2021 VIPriors Instance Segmentation Challenge (ICCV 2021).
2021-07: One paper on video inpainting detection is accepted by ICCV 2021.
2021-04: One paper on kinship verification is accepted by TIP.
2021-03: Three papers on uncertainty learning, kinship verification, and face clustering are accepted to CVPR 2021.
|
Recent Selected Publications [ Full List ]
(*Equal Contribution, #Corresponding Author)
|
|
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Wanhua Li*, Zibin Meng*, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister
Conference on Neural Information Processing Systems (NeurIPS), 2024
[Website]
[arxiv]
[Video]
[Code]
We present SocialGPT, a modular framework with greedy segment prompt optimization for social relation reasoning, which attains competitive results while also providing interpretable explanations.
|
|
LangSplat: 3D Language Gaussian Splatting
Minghan Qin*, Wanhua Li*#, Jiawei Zhou*, Haoqian Wang#, Hanspeter Pfister
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight)
[Website]
[arxiv]
[Video]
[Code]
We ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF.
|
|
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, and Donglai Wei
IEEE International Conference on Computer Vision (ICCV), 2023
[Website]
[arxiv]
[Code]
[Video]
To facilitate using pre-trained models in MMT, we propose CLIPTrans, which transfers the multimodal representations of M-CLIP into a multilingual mBART.
|
|
CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering
Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie Zhou, and Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2023
[Website]
[arxiv]
[Code]
[Video]
We propose an attribute hallucination framework named CLIP-Cluster to narrow the intraclass variance caused by different face attributes for face clustering.
|
|
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jie Zhou, and Jiwen Lu
Conference on Neural Information Processing Systems (NeurIPS), 2022
[Website]
[arxiv]
[Code]
[中文解读]
We propose a language-powered paradigm for ordinal regression, which learns the rank concepts from the rich semantic CLIP latent space.
|
|
Label2Label: A Language Modeling Framework for Multi-Attribute Learning
Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie Zhou, and Jiwen Lu
European Conference on Computer Vision (ECCV), 2022
[Website]
[arxiv]
[Video]
[Code]
We propose a language modeling framework named Label2Label to model the complex instance-wise attribute relations,
which regards each attribute label as a “word” and recovers the label “sentence” based on the masked one.
|
|
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis
Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, and Jiwen Lu
European Conference on Computer Vision (ECCV), 2022
[Website]
[arxiv]
[Video]
[Code]
We propose dynamic facial radiance fields conditioned on the 3D aware reference image features.
The facial field can rapidly generalize to novel identities with only 15s clip.
|
|
Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, and Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2021
[Paper]
[bibtex]
We propose a Frequency-Aware Spatiotemporal Transformer for video inpainting detection, which simultaneously mines the traces of video inpainting from spatial, temporal, and frequency domains.
|
|
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
Wanhua Li, Xiaoke Huang, Jiwen Lu, Jianjiang Feng, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Website]
[arxiv]
[Video]
[Code]
We propose probabilistic ordinal embeddings to empower the present-day regression methods with the ability of uncertainty estimation.
|
|
Meta-Mining Discriminative Samples for Kinship Verification
Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Website]
[arxiv]
[Video]
[bibtex]
A Discriminative Sample Meta-Mining strategy is proposed to mine discriminative information from limited positive pairs and sufficient negative samples for kinship verification.
|
|
Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes
Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Website]
[arxiv]
[Code]
[Video]
It is the first face clustering method to train on very large-scale graph with 20M nodes, and achieve superior inference results on 12M testing data.
|
|
Graph-Based Social Relation Reasoning
Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, and Jie Zhou
European Conference on Computer Vision (ECCV), 2020
[Website]
[arxiv]
[Video]
[Code]
A simpler, faster, and more accurate method for social relation recognition.
|
|
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie Zhou, Qi Tian
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
[arXiv]
[PDF]
[bibtex]
We propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively.
|
Honors and Awards
NeurIPS Scholar Award, 2022.
ICCV Doctoral Consortium Travel Award, 2021.
Weihai Talent Scholarship, Tsinghua, 2021.
3rd Place in 2021 VIPriors Instance Segmentation Challenge @ICCV 2021.
Outstanding Oral Presentation at Beijing University Academic Forum on Artificial Intelligence, 2021
2nd Place in ChaLearn LAP Large-scale Isolated Gesture Recognition Challenge @ICCV 2017.
Outstanding Undergraduate Thesis, SYSU, 2017.
Outstanding Graduate, SYSU, 2017.
National Encouragement Scholarship, Ministry of Education of P.R. China, 2016.
National Scholarship, Ministry of Education of P.R. China, 2015.
National Scholarship, Ministry of Education of P.R. China, 2014.
|
Professional Activities
Reviewer, IEEE Transactions on Pattern Analysis and Machine Intelligence.
Reviewer, IEEE Transactions on Image Processing.
Reviewer, IEEE Transactions on Neural Networks and Learning Systems.
Reviewer, IEEE Transactions on Circuits and Systems for Video Technology.
Reviewer, IEEE Transactions on Biometrics, Behavior, and Identity Science.
Reviewer, IEEE Transactions on Artificial Intelligence.
Reviewer, IEEE Transactions on Affective Computing.
Reviewer, IEEE Transactions on Cybernetics.
Reviewer, IEEE Transactions on Multimedia.
Reviewer, IEEE Signal Processing Letters.
Reviewer, International Journal of Computer Vision.
Reviewer, Pattern Recognition.
Reviewer, Neural Networks.
Reviewer, Neurocomputing.
Reviewer, Pattern Recognition Letters.
Reviewer, Journal of Visual Communication and Image Representation.
Reviewer, Knowledge-Based Systems.
Reviewer, Frontiers of Computer Science.
Reviewer, SIGGRAPH 2024.
Reviewer, International Conference on Computer Vision (ICCV), 2021-2023.
Reviewer, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022-2024.
Reviewer, European Conference on Computer Vision (ECCV), 2022-2024.
Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2023-2024.
PC member, AAAI Conference on Artificial Intelligence (AAAI), 2022-2024.
PC member, International Joint Conference on Artificial Intelligence (IJCAI), 2022-2023.
Reviewer, IEEE International Conference on Multimedia and Expo (ICME), 2019-2023.
Reviewer, IEEE International Conference on Image Processing (ICIP), 2018-2023.
Reviewer, International Conference on Pattern Recognition (ICPR), 2018-2022.
Reviewer, Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2021-2023.
Reviewer, IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023-2024.
|
|