Biography
I am a Research Scientist at Google DeepMind. I was a Postdoctoral Research Scientist at DVMM Lab, Columbia University, working with Prof. Shih-Fu Chang.
I received my Ph.D. from the Department of Computer Science and Technology at Tsinghua University in 2019 under the supervision of Prof. Chongrong Li. Before that, I received my bachelor’s degree at the Department of Computer Science and Technology, Beijing University of Posts and Telecommunications (BUPT) in 2014.
My current research topics focuses broadly on computer vision, multimedia and deep learning. Some ongoing projects include research on Multimodal Large Language Model, hallucinations, and few-shot/self-supervised learning. I am looking for student researchers to collaborate on one of the above topics. Feel free to contact me if you would like to learn more and collabrate!
News
- [06/2024] Serving as an Area Chair for ACM MM 2024, BMVC 2024, and a Senior PC member for AAAI 2025.
- [02/2024] Two papers on Multimodal Large Language Model are accepted by CVPR 2024, and one is selected as hightlight.
- [06/2023] One paper accepted by TCSVT 2023, and one accepted by Communications Medicine 2023.
- [02/2023] Two papers accepted by CVPR 2023 and one accepted by ICLR 2023.
- [10/2022] One paper accepted by EMNLP 2022.
- [07/2022] Two papers accepted by ECCV 2022.
- [06/2022] I will attend CVPR 2022 in person, and hope to see you guys in New Orleans.
- [03/2022] Two papers accepted by CVPR 2022 (including one first-author paper). Thanks to all the collaborators.
- [12/2021] One first-author paper accepted by AAAI 2022. Thanks to all the collaborators.
- [10/2021] One paper accepted by NeurIPS 2021 Datasets and Benchmarks Track. Congratulations to the amazing team!
- [07/2021] Two papers accepted by ICCV 2021 (including one first-author paper). Special thanks to all the collaborators.
ArXiv Preprints
-
Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning
Dongmin Park*, Zhaofang Qian*, Guangxing Han, Ser-Nam Lim
arXiv preprint (arXiv)
[Paper] -
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Jiawei Ma, Yulei Niu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang
arXiv preprint (arXiv)
[Paper] -
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting
Guangxing Han, Long Chen, Jiawei Ma, Shiyuan Huang, Rama Chellappa, Shih-Fu Chang
arXiv preprint (arXiv)
[Paper]
Selected Publications
-
Few-Shot Object Detection with Foundation Models
Guangxing Han, Ser-Nam Lim
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Seattle WA, USA, 2024.
[Paper] -
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick*, Guangxing Han*, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi
IEEE Conference on Computer Vision and Pattern Recognition (CVPR, Highlight). Seattle WA, USA, 2024. (acceptance rate 2.8%)
[Paper][Project]
Shraman Pramanick was a Meta intern working with me. -
One-Shot Unsupervised Cross-Domain Person Re-Identification
Guangxing Han, Xuan Zhang, Chongrong Li
Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2023.
[Paper] -
Rapidly adaptable automated interpretation of point-of-care COVID-19 diagnostics
Siddarth Arumugam*, Jiawei Ma*, Uzay Macar, Guangxing Han, Kathrine McAulay, etc.
Accepted by Communications Medicine, 2023.
[Paper][Code] -
Supervised Masked Knowledge Distillation for Few-Shot Transformers
Han Lin*, Guangxing Han* (Corresponding Author), Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada, 2023.
[Paper][Code]
Han Lin was a Columbia CS master student working with me. -
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada, 2023.
[Paper][Code] -
TempCLR: Temporal Alignment Representation with Contrastive Learning
Yuncong Yang*, Jiawei Ma*, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang
The Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 2023.
[Paper][Code] -
Weakly-Supervised Temporal Article Grounding
Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, and Shih-Fu Chang
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, 2022.
[Paper][Code] -
Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads
Jiawei Ma, Guangxing Han, Shiyuan Huang, Yuncong Yang, Shih-Fu Chang
European Conference on Computer Vision (ECCV), Tel-Aviv, 2022.
[Paper][Code] -
Explicit Image Caption Editing
Zhen Wang*, Long Chen*, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, and Jun Xiao
European Conference on Computer Vision (ECCV), Tel-Aviv, 2022.
[Paper][Code] -
Few-Shot Object Detection with Fully Cross-Transformer
Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, Louisiana, 2022. (Full paper, Oral, acceptance rate 4.2%)
[Paper][Code] -
Task-Adaptive Negative Class Envision for Few-Shot Open-Set Recognition
Shiyuan Huang*, Jiawei Ma*, Guangxing Han, Shih-Fu Chang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, Louisiana, 2022. (Full paper)
[Paper][Code] -
Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment
Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, Shih-Fu Chang
Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI). Virtual, 2022. (Full paper, Oral, acceptance rate 4.6%)
[Paper][Code] -
The Met Dataset: Instance-level Recognition for Artworks
Nikolaos-Antonios Ypsilantis, Noa Garcia, Guangxing Han, Sarah Ibrahimi, Nanne Van Noord, Giorgos Tolias
NeurIPS 2021 Datasets and Benchmarks Track (NeurIPS). Virtual, 2021. (Full paper)
[Paper][Project][Data Collection Code]
A collaboration with researchers from multiple research institutions across the world. This paper introduces a new large-scale dataset for instance-level recognition (ILR) on artworks. My contribution is to prepare the dataset and set up the baseline experiments. -
Query Adaptive Few-Shot Object Detection with Heterogeneous Graph Convolutional Networks
Guangxing Han, Yicheng He, Shiyuan Huang, Jiawei Ma, Shih-Fu Chang
IEEE/CVF International Conference on Computer Vision (ICCV). Virtual, 2021. (Full paper, acceptance rate 25.9%)
[Paper][Code] -
Partner-Assisted Learning for Few-Shot Image Classification
Jiawei Ma*, Hanchen Xie*, Guangxing Han, Shih-Fu Chang, Aram Galstyan, Wael AbdAlmageed
IEEE/CVF International Conference on Computer Vision (ICCV). Virtual, 2021. (Full paper, acceptance rate 25.9%)
[Paper] -
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, etc.
Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) Demonstrations. Virtual, 2021. Best Demo Paper Award
[Paper] [Poster][Dataset]
A collaboration with researchers from multiple research institutions in the USA. This paper proposes a novel framework, COVID-KG, to extract fine-grained multimedia knowledge elements from scientific literature. As the lead author from the Columbia team, my contribution is to develop the tool for figure extraction, subfigure segmentation from pdf files and extend the text-based KG by cross-modal entity grounding. -
Semi-Supervised DFF: Decoupling Detection and Feature Flow for Video Object Detectors
Guangxing Han, Xuan Zhang, Chongrong Li
ACM International Conference on Multimedia (ACM MM). Seoul, Korea, 2018. (Full paper, acceptance rate 27.5%)
[Paper][Code] -
Revisiting Faster R-CNN: A Deeper Look at Region Proposal Network
Guangxing Han, Xuan Zhang, Chongrong Li
The 24th International Conference On Neural Information Processing (ICONIP). Guangzhou, China, 2017. (Full paper, Oral, acceptance rate 36.5%)
[Paper] -
Single Shot Object Detection with Top-Down Refinement
Guangxing Han, Xuan Zhang, Chongrong Li
IEEE International Conference on Image Processing (ICIP). Beijing, China, 2017. (Full paper, acceptance rate 45.3%)
[Paper][Code] -
Unsupervised Feature Propagation for Fast Video Object Detection using Generative Adversarial Networks
Xuan Zhang, Guangxing Han, Wenduo He
26th International Conference on Multimedia Modeling (MMM). Daejeon, Korea, 2020. (Full paper)
[Paper]
Student first-author
Professional Services
- Co-organizer of Instance-Level Recognition Workshop at ECCV’20 (Working on Artwork Recognition), ICCV’21 (Website Chair), ECCV’22 (Website Chair), and ECCV’24 (Website Chair).
- Conference Area Chair: ACM MM 2024, BMVC 2024
- Conference Senior PC member/Meta-Reviewer: AAAI 2025/2024/2023
- Conference PC member/Reviewer: CVPR 2024/2023/2022, ECCV 2024/2022, ICCV 2023, ICLR 2025/2024, NeurIPS 2024/2023, ACL 2024, ACM MM 2023/2022/2021/2020, ICASSP 2023, ACM MM Asia 2021, ECCV 2024 Instance-Level Recognition (ILR) workshop.
- Journal Reviewer: International Journal of Computer Vision (IJCV), IEEE Transactions on Image Processing (TIP), IEEE Transactions on Neural Networks and Learning Systems (TNNLS), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Systems, Man and Cybernetics: Systems (TSMC), Pattern Recognition (PR), IEEE Signal Processing Letters (SPL), IEEE ACCESS.
Selected Awards
- Best Demo Paper Award at NAACL 2021.
- 2017-2018 The First Prize Scholarship at Tsinghua University.
- ACM MM 2018 Student Travel Grants.
- The 38th ACM-ICPC Asia Invitational Programming Contest, Nanjing Site, 2013, Gold Medal.
- The 38th ACM-ICPC Asia Regional Programming Contest, Changsha Site, 2013, Bronze Medal.