This repository contains the implementation of the following paper:
"D2-Net: A Trainable CNN for Joint Detection and Description of Local Features".
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. CVPR 2019.
Python 3.6+ is recommended for running our code. Conda can be used to install the required packages:
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
conda install h5py imageio imagesize matplotlib numpy scipy tqdm
The off-the-shelf Caffe VGG16 weights and their tuned counterpart can be downloaded by running:
mkdir models
wget https://dusmanu.com/files/d2-net/d2_ots.pth -O models/d2_ots.pth
wget https://dusmanu.com/files/d2-net/d2_tf.pth -O models/d2_tf.pth
wget https://dusmanu.com/files/d2-net/d2_tf_no_phototourism.pth -O models/d2_tf_no_phototourism.pth
Update - 23 May 2019 We have added a new set of weights trained on MegaDepth without the PhotoTourism scenes (sagrada_familia - 0019, lincoln_memorial_statue - 0021, british_museum - 0024, london_bridge - 0025, us_capitol - 0078, mount_rushmore - 1589). Our initial results show similar performance. In order to use these weights at test time, you should add --model_file models/d2_tf_no_phototourism.pth
.
extract_features.py
can be used to extract D2 features for a given list of images. The singlescale features require less than 6GB of VRAM for 1200x1600 images. The --multiscale
flag can be used to extract multiscale features - for this, we recommend at least 12GB of VRAM.
The output format can be either npz
or mat
. In either case, the feature files encapsulate three arrays:
keypoints
[N x 3
] array containing the positions of keypointsx, y
and the scaless
. The positions follow the COLMAP format, with theX
axis pointing to the right and theY
axis to the bottom.scores
[N
] array containing the activations of keypoints (higher is better).descriptors
[N x 512
] array containing the L2 normalized descriptors.
python extract_features.py --image_list_file images.txt (--multiscale)
Kapture is a pivot file format, based on text and binary files, used to describe SFM (Structure From Motion) and more generally sensor-acquired data.
It is available at https://github.com/naver/kapture. It contains conversion tools for popular formats and several popular datasets are directly available in kapture.
It can be installed with:
pip install kapture
Datasets can be downloaded with:
kapture_download_dataset.py update
kapture_download_dataset.py list
# e.g.: install mapping and query of Extended-CMU-Seasons_slice22
kapture_download_dataset.py install "Extended-CMU-Seasons_slice22_*"
If you want to convert your own dataset into kapture, please find some examples here.
Once installed, you can extract keypoints for your kapture dataset with:
python extract_kapture.py --kapture-root pathto/yourkapturedataset (--multiscale)
Run python extract_kapture.py --help
for more information on the extraction parameters.
The training pipeline provided here is a PyTorch implementation of the TensorFlow code that was used to train the model available to download above.
Update - 05 June 2019 We have fixed a bug in the dataset preprocessing - retraining now yields similar results to the original TensorFlow implementation.
Update - 07 August 2019 We have released an updated, more accurate version of the training dataset - training is more stable and significantly faster for equal performance.
For this part, COLMAP should be installed. Please refer to the official website for installation instructions.
After downloading the entire MegaDepth dataset (including SfM models), the first step is generating the undistorted reconstructions. This can be done by calling undistort_reconstructions.py
as follows:
python undistort_reconstructions.py --colmap_path /path/to/colmap/executable --base_path /path/to/megadepth
Next, preprocess_megadepth.sh
can be used to retrieve the camera parameters and compute the overlap between images for all scenes.
bash preprocess_undistorted_megadepth.sh /path/to/megadepth /path/to/output/folder
After downloading and preprocessing MegaDepth, the training can be started right away:
python train.py --use_validation --dataset_path /path/to/megadepth --scene_info_path /path/to/preprocessing/output
If you use this code in your project, please cite the following paper:
@InProceedings{Dusmanu2019CVPR,
author = {Dusmanu, Mihai and Rocco, Ignacio and Pajdla, Tomas and Pollefeys, Marc and Sivic, Josef and Torii, Akihiko and Sattler, Torsten},
title = {{D2-Net: A Trainable CNN for Joint Detection and Description of Local Features}},
booktitle = {Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2019},
}