Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Official implementation of 'Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders'.

The paper has been accepted by CVPR 2023 🔥.

News

The pre-training and fine-tuning code of I2P-MAE has been released.
The 3D-only variant of I2P-MAE is our previous work, Point-M2AE, accepted by NeurIPS 2022 and open-sourced. We have released its pre-training and fine-tuning code.
📣 Please check our latest work Point-NN, Parameter is Not All You Need accepted by CVPR 2023, which, for the first time, acheives 3D understanding with $\color{darkorange}{No\ Parameter\ or\ Training.}$ 💥
📣 Please check our latest work PiMAE accepted by CVPR 2023, which promotes 3D and 2D interaction to improve 3D object detection performance.

Introduction

Comparison with existing MAE-based 3D models on the three spilts of ScanObjectNN:

Method	Parameters	GFlops	Extra Data	OBJ-BG	OBJ-ONLY	PB-T50-RS
Point-BERT	22.1M	4.8	-	87.43%	88.12%	83.07 %
ACT	22.1M	4.8	2D	92.48%	91.57%	87.88%
Point-MAE	22.1M	4.8	-	90.02%	88.29%	85.18%
Point-M2AE	12.9M	3.6	-	91.22%	88.81%	86.43%
I2P-MAE	12.9M	3.6	2D	94.15%	91.57%	90.11%

We propose an alternative to obtain superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders, named as I2P-MAE. By self-supervised pre-training, we leverage the well learned 2D knowledge to guide 3D masked autoencoding, which reconstructs the masked point tokens with an encoder-decoder architecture. Specifically, we conduct two types of image-to-point learning schemes: 2D-guided masking and 2D-semantic reconstruction. In this way, the 3D network can effectively inherit high-level 2D semantics learned from rich image data for discriminative 3D modeling.

I2P-MAE Models

Pre-training

Guided by pre-trained CLIP on ShapeNet, I2P-MAE is evaluated by Linear SVM on ModelNet40 and ScanObjectNN (OBJ-BG split) datasets, without downstream fine-tuning:

Task	Dataset	Config	MN40 Acc.	OBJ-BG Acc.	Ckpts	Logs
Pre-training	ShapeNet	i2p-mae.yaml	93.35%	87.09%	pre-train.pth	log

Fine-tuning

Synthetic shape classification on ModelNet40 with 1k points:

Task	Config	Acc.	Vote	Ckpts	Logs
Classification	modelnet40.yaml	93.67%	94.06%	modelnet40.pth	modelnet40.log

Real-world shape classification on ScanObjectNN:

Task	Split	Config	Acc.	Ckpts	Logs
Classification	PB-T50-RS	scan_pb.yaml	90.11%	scan_pd.pth	scan_pd.log
Classification	OBJ-BG	scan_obj-bg.yaml	94.15%	-	-
Classification	OBJ-ONLY	scan_obj.yaml	91.57%	-	-

Requirements

Installation

Create a conda environment and install basic dependencies:

git clone https://github.com/ZrrSkywalker/I2P-MAE.git
cd I2P-MAE

conda create -n i2pmae python=3.7
conda activate i2pmae

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit
# e.g., conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3

pip install -r requirements.txt
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html

Install GPU-related packages:

# Chamfer Distance and EMD
cd ./extensions/chamfer_dist
python setup.py install --user
cd ../emd
python setup.py install --user

# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Datasets

For pre-training and fine-tuning, please follow DATASET.md to install ShapeNet, ModelNet40, ScanObjectNN, and ShapeNetPart datasets, referring to Point-BERT. Specially for Linear SVM evaluation, download the official ModelNet40 dataset and put the unzip folder under data/.

The final directory structure should be:

│I2P-MAE/
├──cfgs/
├──datasets/
├──data/
│   ├──ModelNet/
│   ├──ModelNetFewshot/
│   ├──modelnet40_ply_hdf5_2048/  # Specially for Linear SVM
│   ├──ScanObjectNN/
│   ├──ShapeNet55-34/
│   ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──...

Get Started

Pre-training

I2P-MAE is pre-trained on ShapeNet dataset with the config file cfgs/pre-training/i2p-mae.yaml. Run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/pre-training/i2p-mae.yaml --exp_name pre-train

To evaluate the pre-trained I2P-MAE by Linear SVM, create a folder ckpts/ and download the pre-train.pth into it. Use the configs in cfgs/linear-svm/ and indicate the evaluation dataset by --test_svm.

For ModelNet40, run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/linear-svm/modelnet40.yaml --test_svm modelnet40 --exp_name test_svm --ckpts ./ckpts/pre-train.pth

For ScanObjectNN (OBJ-BG split), run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/linear-svm/scan_obj-bg.yaml --test_svm scan --exp_name test_svm --ckpts ./ckpts/pre-train.pth

Fine-tuning

Please create a folder ckpts/ and download the pre-train.pth into it. The fine-tuning configs are in cfgs/fine-tuning/.

For ModelNet40, run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/modelnet40.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth

For the three splits of ScanObjectNN, run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/scan_pb.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/scan_obj.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/scan_obj-bg.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth

Acknowledgement

This repo benefits from Point-M2AE, Point-BERT, Point-MAE, and CLIP. Thanks for their wonderful works.

Contact

If you have any question about this project, please feel free to contact zhangrenrui@pjlab.org.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cfgs		cfgs
clip_attn		clip_attn
datasets		datasets
extensions		extensions
models		models
tools		tools
utils		utils
README.md		README.md
main.py		main.py
pipeline.png		pipeline.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

News

Introduction

I2P-MAE Models

Pre-training

Fine-tuning

Requirements

Installation

Datasets

Get Started

Pre-training

Fine-tuning

Acknowledgement

Contact

About

Releases

Packages

Contributors 2

Languages

ZrrSkywalker/I2P-MAE

Folders and files

Latest commit

History

Repository files navigation

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

News

Introduction

I2P-MAE Models

Pre-training

Fine-tuning

Requirements

Installation

Datasets

Get Started

Pre-training

Fine-tuning

Acknowledgement

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages