GitHub - joonson/voxceleb_unsupervised: Augmentation adversarial training for self-supervised speaker recognition

Unsupervised VoxCeleb trainer

This repository contains the training code for the models described in the paper 'Augmentation adversarial training for self-supervised speaker recognition'.

Dependencies

pip install -r requirements.txt

Data preparation

The VoxCeleb datasets are used for these experiments. Follow the instructions on this page to download and prepare the data for training.

In addition, you need to download the MUSAN noise corpus.

First, download and extract the files, then use the command

python ./process_musan.py /parent/dir/of/musan/

to split the audio files into short segments for faster random access.

Training example

python ./trainSpeakerNet.py --log_input True --save_path data/exp1 --augment_anchor True --augment_type 2 --train_list /path/to/voxcelebs/train_list.txt  --test_list /path/to/voxcelebs/test_list.txt --train_path /path/to/voxcelebs/voxceleb2 --test_path /path/to/voxcelebs/voxceleb1 --musan_path /path/to/musan_split

The arguments can also be passed as --config path_to_config.yaml. Note that the configuration file overrides the arguments passed via command line.

Pretrained model

A baseline model trained using the parameters above can be downloaded from here.

You can check that the following script returns: EER 11.8134.

python ./trainSpeakerNet.py --eval --log_input True --save_path data/test --test_list /path/to/voxcelebs/test_list.txt --test_path /path/to/voxcelebs/voxceleb1 --initial_model baseline_unsuper.model

A model trained using --augment_type 3 --env_iteration 1 --alpha 3 in addition to the above parameters above can be downloaded from here.

You can check that the following script returns: EER 8.4995.

Implemented loss functions

Prototypical (proto)
Angular Prototypical (angleproto)

Implemented models and encoders

Note that the model definitions are not compatible with those in voxceleb_trainer, since the spectrograms are extracted in the data loader.

ResNetSE34L (SAP)

Citation

Please cite the following if you make use of the code.

@inproceedings{huh2020augmentation,
  title={Augmentation adversarial training for unsupervised speaker recognition},
  author={Huh, Jaesung and Heo, Hee Soo and Kang, Jingu and Watanabe, Shinji and Chung, Joon Son},
  booktitle={Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
loss		loss
models		models
optimizer		optimizer
pytorch_revgrad		pytorch_revgrad
scheduler		scheduler
.gitignore		.gitignore
ConfModel.py		ConfModel.py
DatasetLoader.py		DatasetLoader.py
README.md		README.md
SpeakerNet.py		SpeakerNet.py
accuracy.py		accuracy.py
config.py		config.py
process_musan.py		process_musan.py
rir.npy		rir.npy
trainSpeakerNet.py		trainSpeakerNet.py
tuneThreshold.py		tuneThreshold.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised VoxCeleb trainer

Dependencies

Data preparation

Training example

Pretrained model

Implemented loss functions

Implemented models and encoders

Citation

About

Releases

Packages

Languages

joonson/voxceleb_unsupervised

Folders and files

Latest commit

History

Repository files navigation

Unsupervised VoxCeleb trainer

Dependencies

Data preparation

Training example

Pretrained model

Implemented loss functions

Implemented models and encoders

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages