GitHub - hjwdzh/Uni3D: Uni3D: 3D Visual Representation from BAAI

Uni3D: Exploring Unified 3D Representation at Scale

Junsheng Zhou^1,2*, Jinsheng Wang^1*, Baorui Ma^1*, Yu-Shen Liu², Tiejun Huang^1,3, Xinlong Wang¹

¹BAAI, ²THU, ³PKU
^* Equal Contribution

We expand Uni3D into a 3D metric in the text-to-3D task for measuring semantic coherence, lifting the semantic measurement from 2D to 3D. Please refer to the GeoDream code repository.

We present Uni3D, a unified and scalable 3D pretraining framework for large-scale 3D representation learning, and explore its limits at the scale of one billion parameters. Uni3D uses a 2D initialized ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. Via the simple architecture and pretext task, Uni3D can leverage abundant 2D pretrained models as initialization and image-text aligned models as the target, unlocking the great potential of 2D models and scaling-up strategies to the 3D world. We efficiently scale up Uni3D to one billion parameters, and set new records on a broad range of 3D tasks.

Installation

Clone this repository and install the required packages:

git clone https://github.com/baaivision/Uni3D.git
cd Uni3D

conda create -n uni3d python=3.8
conda activate uni3d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

pip install -r requirements.txt

# install pointnet2 extensions from https://github.com/erikwijmans/Pointnet2_PyTorch
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

Core packages:

Pytorch version 2.0.1
open-clip-torch version 2.20.0
timm version 0.9.7
DeepSpeed version 0.10.3
Open3D version 0.17.0

Model Zoo

Model	Training Data	Objaverse-LVIS Top1 (Top5)	ModelNet40 Top1 (Top5)	ScanObjectNN Top1 (Top5)
Uni3d-g	Ensembled w/o LVIS	47.2 (76.1)	86.8 (98.4)	66.5 (90.1)
Uni3d-g	Ensembled	53.5 (82.0)	87.3 (99.2)	63.9 (91.7)
Uni3d-g 🔥	Ensembled	55.3 (82.9)	88.2 (99.3)	65.3 (92.7)

Evaluation of Zero-shot 3D classification

We evaluate the zero-shot 3D classification performance on three datasets: Objaverse-LVIS, ModelNet40 and ScanObjectNN.

Please refer to DATASETS.md for evaluation dataset preparation.
[Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
Download model zoo weights and put them in /path/to/checkpoints folder.
Run bash scripts/inference.sh to evaluate the model on the above datasets.

Pre-training

Please refer to DATASETS.md for pre-train dataset preparation.
[Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
[Recommended 🤗] Download the initialization model and put it in /path/to/init_model folder.
Run bash scripts/pretrain.sh to pre-train the model on ensemble datasets.

Visualization

Open-world Understanding

One-shot Part Segmentation

Point Cloud Painting

Cross-modal Retrieval

Schedule

We are committed to open-sourcing Uni3D related materials, including:

We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.

Acknowledgement

Uni3D is built using the awesome EVA, OpenCLIP, timm, DeepSpeed, ULIP and OpenShape.

Citation

@article{uni3d,
  title={Uni3D: Exploring Unified 3D Representation at Scale},
  author={Zhou, Junsheng and Wang, Jinsheng and Ma, Baorui and Liu, Yu-Shen and Huang, Tiejun and Wang, Xinlong},
  journal={arXiv preprint arXiv:2310.06773},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uni3D: Exploring Unified 3D Representation at Scale

Installation

Model Zoo

Evaluation of Zero-shot 3D classification

Pre-training