Wanhua Li

I am currently a postdoctoral fellow at Harvard University supervised by Prof. Hanspeter Pfister. Prior to that, I received my Ph.D. from the Department of Automation at Tsinghua University in 2022, advised by Prof. Jiwen Lu, Prof. Jianjiang Feng , and Prof. Jie Zhou. In 2017, I received my B.S. degree in computer science at Sun Yat-sen University, Guangzhou, China.

My research interests mainly include vision-language models, neural rendering, and 3D-aware synthesis.

Email: wanhua [AT] seas [DOT] harvard [DOT] edu

CV  /  Google Scholar  /  Twitter  /  GitHub

profile photo
News

  • 2025-02: Our S-LoRA paper is selected as an ICLR oral paper.
  • 2025-01: Two papers on Vision Language Models Prompting and Tuning are accepted by ICLR 2025.
  • 2024-09: One paper on Vision Language Models Prompting is accepted by NeurIPS 2024.
  • 2024-07: One paper on Video Temporal Grounding is accepted by ECCV 2024.
  • 2024-06: One paper on IVF is accepted by MICCAI 2024.
  • 2024-05: Congratulations to Karly Hou. Her undergraduate thesis supervised by me won Harvard's Hoopes Prize!
  • 2024-05: One paper on multimodal learning is early accepted (top 11%) by MICCAI 2024.
  • 2024-05: One paper on connectomics is accepted by TMI.
  • 2024-04: One paper on Deepfake detection is accepted by Pattern Recognition (PR).
  • 2024-04: Our LangSplat paper is selected as a CVPR Highlight paper.
  • 2024-02: Two papers on 3D Gaussian splatting and multi-task learning are accepted by CVPR 2024.
  • 2023-08: One paper on talking head synthesis is accepted by TMM.
  • 2023-07: One paper on face clustering is accepted by T-PAMI.
  • 2023-07: Two papers on face clustering and foundation models are accepted by ICCV 2023.
  • 2023-06: One paper on deepfake detection is accepted by TIP.
  • 2023-02: One paper on talking head synthesis is accepted to CVPR 2023.
  • 2022-10: I joined Harvard as a postdoc!
  • 2022-09: One paper on language-guided ordinal regression is accepted by NeurIPS 2022.
  • 2022-07: Two papers on multi-attribute learning and talking head synthesis are accepted by ECCV 2022.
  • 2022-06: One paper on age estimation is accepted by TIP.
  • 2022-04: One paper on image inpainting is accepted by TMM.
  • 2021-10: Our team won the 3rd place in 2021 VIPriors Instance Segmentation Challenge (ICCV 2021).
  • 2021-07: One paper on video inpainting detection is accepted by ICCV 2021.
  • 2021-04: One paper on kinship verification is accepted by TIP.
  • 2021-03: Three papers on uncertainty learning, kinship verification, and face clustering are accepted to CVPR 2021.
  • Recent Selected Publications [ Full List ]

    (*Equal Contribution, #Corresponding Author)

    dise SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
    Wanhua Li*, Zibin Meng*, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister Conference on Neural Information Processing Systems (NeurIPS), 2024
    [Website] [arxiv] [Video] [Code]

    We present SocialGPT, a modular framework with greedy segment prompt optimization for social relation reasoning, which attains competitive results while also providing interpretable explanations.

    dise LangSplat: 3D Language Gaussian Splatting
    Minghan Qin*, Wanhua Li*#, Jiawei Zhou*, Haoqian Wang#, Hanspeter Pfister
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight)
    [Website] [arxiv] [Video] [Code]

    We ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF.

    dise CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
    Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, and Donglai Wei
    IEEE International Conference on Computer Vision (ICCV), 2023
    [Website] [arxiv] [Code] [Video]

    To facilitate using pre-trained models in MMT, we propose CLIPTrans, which transfers the multimodal representations of M-CLIP into a multilingual mBART.

    dise CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering
    Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie Zhou, and Jiwen Lu 
    IEEE International Conference on Computer Vision (ICCV), 2023
    [Website] [arxiv] [Code] [Video]

    We propose an attribute hallucination framework named CLIP-Cluster to narrow the intraclass variance caused by different face attributes for face clustering.

    dise OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
    Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jie Zhou, and Jiwen Lu
    Conference on Neural Information Processing Systems (NeurIPS), 2022
    [Website] [arxiv] [Code] [中文解读]

    We propose a language-powered paradigm for ordinal regression, which learns the rank concepts from the rich semantic CLIP latent space.

    dise Label2Label: A Language Modeling Framework for Multi-Attribute Learning
    Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie Zhou, and Jiwen Lu
    European Conference on Computer Vision (ECCV), 2022
    [Website] [arxiv] [Video] [Code]

    We propose a language modeling framework named Label2Label to model the complex instance-wise attribute relations, which regards each attribute label as a “word” and recovers the label “sentence” based on the masked one.

    dise Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis
    Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, and Jiwen Lu
    European Conference on Computer Vision (ECCV), 2022
    [Website] [arxiv] [Video] [Code]

    We propose dynamic facial radiance fields conditioned on the 3D aware reference image features. The facial field can rapidly generalize to novel identities with only 15s clip.

    dise Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
    Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, and Jie Zhou
    IEEE International Conference on Computer Vision (ICCV), 2021
    [Paper] [bibtex]

    We propose a Frequency-Aware Spatiotemporal Transformer for video inpainting detection, which simultaneously mines the traces of video inpainting from spatial, temporal, and frequency domains.

    dise Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
    Wanhua Li, Xiaoke Huang, Jiwen Lu, Jianjiang Feng, and Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [Website] [arxiv] [Video] [Code]

    We propose probabilistic ordinal embeddings to empower the present-day regression methods with the ability of uncertainty estimation.

    dise Meta-Mining Discriminative Samples for Kinship Verification
    Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, and Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [Website] [arxiv] [Video] [bibtex]

    A Discriminative Sample Meta-Mining strategy is proposed to mine discriminative information from limited positive pairs and sufficient negative samples for kinship verification.

    dise Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes
    Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, and Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [Website] [arxiv] [Code] [Video]

    It is the first face clustering method to train on very large-scale graph with 20M nodes, and achieve superior inference results on 12M testing data.

    dise Graph-Based Social Relation Reasoning
    Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, and Jie Zhou
    European Conference on Computer Vision (ECCV), 2020
    [Website] [arxiv] [Video] [Code]

    A simpler, faster, and more accurate method for social relation recognition.

    dise BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
    Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie Zhou, Qi Tian
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
    [arXiv] [PDF] [bibtex]

    We propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively.

    Honors and Awards

  • NeurIPS Scholar Award, 2022.
  • ICCV Doctoral Consortium Travel Award, 2021.
  • Weihai Talent Scholarship, Tsinghua, 2021.
  • 3rd Place in 2021 VIPriors Instance Segmentation Challenge @ICCV 2021.
  • Outstanding Oral Presentation at Beijing University Academic Forum on Artificial Intelligence, 2021
  • 2nd Place in ChaLearn LAP Large-scale Isolated Gesture Recognition Challenge @ICCV 2017.
  • Outstanding Undergraduate Thesis, SYSU, 2017.
  • Outstanding Graduate, SYSU, 2017.
  • National Encouragement Scholarship, Ministry of Education of P.R. China, 2016.
  • National Scholarship, Ministry of Education of P.R. China, 2015.
  • National Scholarship, Ministry of Education of P.R. China, 2014.
  • Professional Activities

  • Reviewer, IEEE Transactions on Pattern Analysis and Machine Intelligence.
  • Reviewer, IEEE Transactions on Image Processing.
  • Reviewer, IEEE Transactions on Neural Networks and Learning Systems.
  • Reviewer, IEEE Transactions on Circuits and Systems for Video Technology.
  • Reviewer, IEEE Transactions on Biometrics, Behavior, and Identity Science.
  • Reviewer, IEEE Transactions on Artificial Intelligence.
  • Reviewer, IEEE Transactions on Affective Computing.
  • Reviewer, IEEE Transactions on Cybernetics.
  • Reviewer, IEEE Transactions on Multimedia.
  • Reviewer, IEEE Signal Processing Letters.
  • Reviewer, International Journal of Computer Vision.
  • Reviewer, Pattern Recognition.
  • Reviewer, Neural Networks.
  • Reviewer, Neurocomputing.
  • Reviewer, Pattern Recognition Letters.
  • Reviewer, Journal of Visual Communication and Image Representation.
  • Reviewer, Knowledge-Based Systems.
  • Reviewer, Frontiers of Computer Science.
  • Reviewer, SIGGRAPH 2024.
  • Reviewer, International Conference on Computer Vision (ICCV), 2021-2023.
  • Reviewer, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022-2024.
  • Reviewer, European Conference on Computer Vision (ECCV), 2022-2024.
  • Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2023-2024.
  • PC member, AAAI Conference on Artificial Intelligence (AAAI), 2022-2024.
  • PC member, International Joint Conference on Artificial Intelligence (IJCAI), 2022-2023.
  • Reviewer, IEEE International Conference on Multimedia and Expo (ICME), 2019-2023.
  • Reviewer, IEEE International Conference on Image Processing (ICIP), 2018-2023.
  • Reviewer, International Conference on Pattern Recognition (ICPR), 2018-2022.
  • Reviewer, Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2021-2023.
  • Reviewer, IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023-2024.

  • Website Template