GitHub - naver-ai/lut: [ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"

Learning with Unmasked Tokens Drives Stronger Vision Learners | [ECCV 2024]

Taekyung Kim*, Sanghyuk Chun, Byeongho Heo, Dongyoon Han*
_{(*equal contribution)}

Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners" | arxiv.

Abstract

Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM’s sole focus on regressing masked tokens only, which may impede the encoder’s broader context learning. To tackle the limitation, we improve MIM by explicitly incorporating unmasked tokens into the training process. Specifically, our method enables the encoder to learn from broader context supervision, allowing unmasked tokens to experience broader contexts while the decoder reconstructs masked tokens. Thus, the encoded unmasked tokens are equipped with extensive contextual information, empowering masked tokens to leverage the enhanced unmasked tokens for MIM. As a result, our simple remedy trains more discriminative representations revealed by achieving 84.2% top-1 accuracy with ViT-B on ImageNet-1K with 0.6%p gain. We attribute the success to the enhanced pre-training method, as evidenced by the singular value spectrum and attention analyses. Finally, our models achieve significant performance gains at the downstream semantic segmentation and fine-grained visual classification tasks; and on diverse robust evaluation metrics

Framework overview

Updates

(07/2024) LUT is accepted at ECCV 2024

Preparation

This repo is based on MAE
This repo uses timm==0.4.12 and pytorch==1.13.0
pip install -r requirements.txt

Training

For training commands, please refer to PRETRAIN.md and TRAINING.md.

Performances

Method	ViT-S	ViT-B	ViT-L
MoCo v3	81.4	83.2	84.1
DINO	81.5	82.8	-
iBOT	82.04	84.0	84.8
MAE	81.4	83.7	85.6

LUT	82.0	84.2	86.0

Citation

@article{kim2023lut,
  title={Learning with Unmasked Tokens Drives Stronger Vision Learners},
  author={Kim, Taekyung and Chun, Sanghyuk and Heo, Byeongho and Han, Dongyoon},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}

License

LUT
Copyright (c) 2024-present NAVER Cloud Corp.
CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
docs		docs
engines		engines
license		license
models		models
util		util
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
main_finetune.py		main_finetune.py
main_lut.py		main_lut.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning with Unmasked Tokens Drives Stronger Vision Learners | [ECCV 2024]

Abstract

Updates

Preparation

Training

Performances

Citation

License

About

Releases

Packages

Contributors 2

Languages

License

naver-ai/lut

Folders and files

Latest commit

History

Repository files navigation

Learning with Unmasked Tokens Drives Stronger Vision Learners | [ECCV 2024]

Abstract

Updates

Preparation

Training

Performances

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages