Junsheng Zhou1,2*, Jinsheng Wang1*, Baorui Ma1*, Yu-Shen Liu2, Tiejun Huang1,3, Xinlong Wang1
1BAAI, 2THU, 3PKU
* Equal Contribution
We expand Uni3D into a 3D metric in the text-to-3D task for measuring semantic coherence, lifting the semantic measurement from 2D to 3D. Please refer to the GeoDream code repository.
We present Uni3D, a unified and scalable 3D pretraining framework for large-scale 3D representation learning, and explore its limits at the scale of one billion parameters. Uni3D uses a 2D initialized ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. Via the simple architecture and pretext task, Uni3D can leverage abundant 2D pretrained models as initialization and image-text aligned models as the target, unlocking the great potential of 2D models and scaling-up strategies to the 3D world. We efficiently scale up Uni3D to one billion parameters, and set new records on a broad range of 3D tasks.
Clone this repository and install the required packages:
git clone https://github.com/baaivision/Uni3D.git
cd Uni3D
conda create -n uni3d python=3.8
conda activate uni3d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
# install pointnet2 extensions from https://github.com/erikwijmans/Pointnet2_PyTorch
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
Core packages:
- Pytorch version 2.0.1
- open-clip-torch version 2.20.0
- timm version 0.9.7
- DeepSpeed version 0.10.3
- Open3D version 0.17.0
Model | Training Data | Objaverse-LVIS Top1 (Top5) | ModelNet40 Top1 (Top5) | ScanObjectNN Top1 (Top5) |
---|---|---|---|---|
Uni3d-g | Ensembled w/o LVIS | 47.2 (76.1) | 86.8 (98.4) | 66.5 (90.1) |
Uni3d-g | Ensembled | 53.5 (82.0) | 87.3 (99.2) | 63.9 (91.7) |
Uni3d-g 🔥 | Ensembled | 55.3 (82.9) | 88.2 (99.3) | 65.3 (92.7) |
We evaluate the zero-shot 3D classification performance on three datasets: Objaverse-LVIS, ModelNet40 and ScanObjectNN.
- Please refer to DATASETS.md for evaluation dataset preparation.
- [Recommended 🤗] Download the clip model and put it in
/path/to/clip_model
folder. - Download model zoo weights and put them in
/path/to/checkpoints
folder. - Run
bash scripts/inference.sh
to evaluate the model on the above datasets.
- Please refer to DATASETS.md for pre-train dataset preparation.
- [Recommended 🤗] Download the clip model and put it in
/path/to/clip_model
folder. - [Recommended 🤗] Download the initialization model and put it in
/path/to/init_model
folder. - Run
bash scripts/pretrain.sh
to pre-train the model on ensemble datasets.
We are committed to open-sourcing Uni3D related materials, including:
- The weights of Uni3D-g
- Evaluation code
- Evaluation data
- Pretraining code
- Pretraining data
We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.
Uni3D is built using the awesome EVA, OpenCLIP, timm, DeepSpeed, ULIP and OpenShape.
@article{uni3d,
title={Uni3D: Exploring Unified 3D Representation at Scale},
author={Zhou, Junsheng and Wang, Jinsheng and Ma, Baorui and Liu, Yu-Shen and Huang, Tiejun and Wang, Xinlong},
journal={arXiv preprint arXiv:2310.06773},
year={2023}
}