MViTv2

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

Yanghao Li*, Chao-Yuan Wu*, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer*

[arXiv] [BibTeX]

In this repository, we provide detection configs and models for MViTv2 (CVPR 2022) in Detectron2. For image classification tasks, please refer to MViTv2 repo.

Results and Pretrained Models

COCO

Name	pre-train	Method	epochs	box AP	mask AP	#params	FLOPS	model id	download
MViTV2-T	IN1K	Mask R-CNN	36	48.3	43.8	44M	279G	307611773	model
MViTV2-T	IN1K	Cascade Mask R-CNN	36	52.2	45.0	76M	701G	308344828	model
MViTV2-S	IN1K	Cascade Mask R-CNN	36	53.2	46.0	87M	748G	308344647	model
MViTV2-B	IN1K	Cascade Mask R-CNN	36	54.1	46.7	103M	814G	308109448	model
MViTV2-B	IN21K	Cascade Mask R-CNN	36	54.9	47.4	103M	814G	309003202	model
MViTV2-L	IN21K	Cascade Mask R-CNN	50	55.8	48.3	270M	1519G	308099658	model
MViTV2-H	IN21K	Cascade Mask R-CNN	36	56.1	48.5	718M	3084G	309013744	model

Note that the above models were trained and measured on 8-node with 64 NVIDIA A100 GPUs in total. The ImageNet pre-trained model weights are obtained from MViTv2 repo.

Training

All configs can be trained with:

../../tools/lazyconfig_train_net.py --config-file configs/path/to/config.py

By default, we use 64 GPUs with batch size as 64 for training.

Evaluation

Model evaluation can be done similarly:

../../tools/lazyconfig_train_net.py --config-file configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint

Citing MViTv2

If you use MViTv2, please use the following BibTeX entry.

@inproceedings{li2021improved,
  title={MViTv2: Improved multiscale vision transformers for classification and detection},
  author={Li, Yanghao and Wu, Chao-Yuan and Fan, Haoqi and Mangalam, Karttikeya and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph},
  booktitle={CVPR},
  year={2022}
}

Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MViTv2

MViTv2

README.md

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

Results and Pretrained Models

COCO

Training

Evaluation

Citing MViTv2

Files

MViTv2

Directory actions

More options

Directory actions

More options

Latest commit

History

MViTv2

Folders and files

parent directory

README.md

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

Results and Pretrained Models

COCO

Training

Evaluation

Citing MViTv2