KTCN: Enhancing Open-World Object Detection with Knowledge Transfer and Class-Awareness Neutralization

KTCN: Enhancing Open-World Object Detection with Knowledge Transfer and Class-Awareness Neutralization

Xing Xi, Yangyang Huang, Jinhao Lin, Ronghua Luo

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1462-1470. https://doi.org/10.24963/ijcai.2024/162

Open-World Object Detection (OWOD) has garnered widespread attention due to its ability to recall unannotated objects. Existing works generate pseudo-labels for the model using heuristic priors, which limits the model’s performance. In this paper, we leverage the knowledge of the large-scale visual model to provide supervision for unknown categories. Specifically, we use the Segment Anything Model (SAM) to generate raw pseudo-labels for potential objects and refine them through Intersection over Union (IOU) and the shortest bounding box side length. Nevertheless, the abundance of pseudo-labels still exacerbates the competition issue in the one-to-many label assignment. To address this, we propose the Dual Matching Label Assignment (DMLA) strategy. Furthermore, we propose the Class-Awareness Neutralizer (CAN) to reduce the model’s bias towards known categories. Evaluation results on open-world object detection benchmarks, including MS COCO and Pascal VOC, show that our method achieves nearly 200% the unknown recall rate of previous state-of-the-art (SOTA) methods, reaching 41.5 U-Recall. Additionally, our approach does not add any extra parameters, maintaining the inference speed advantage of Faster R-CNN, leading the SOTA methods based on deformable DETR at a speed of over 10 FPS. Our code is available at https://github.com/xxyzll/KTCN.
Keywords:
Computer Vision: CV: Recognition (object detection, categorization)