DiffWave-reimplementation

A custom DiffWave reimplementation for the Practical Work in AI course at JKU.

Set up environment with dependencies

Load conda env with "conda env create -f environment.yml"
Activate conda environment with "conda activate practical_work"
If on Mac/Linux install sox with "conda install -c conda-forge sox" (a dependency of torchaudio)

Prepare data

By default, when using source/data_prep.py, full length audio files are chunked into fixed length samples of 4 seconds (length is configurable). Mel spectrograms are computed for each chunked sample. Default input folder for data processing is "raw_samples" Default output folder for chunked audio is "data/chunked_audio" Default output folder for mel spectrograms is "data/mel_spectrograms"

Set up a folder "raw_samples" in root directory, containing audio files (.mp3 or .wav) that should be used as training data
Make sure that "data/chunked_audio" and "data/mel_spectrograms" folders exist. If not, create them.
Run "python source/data_prep.py" to chop samples in "raw_samples" into 4 second snippets and save to "data/chunked_audio" and spectrograms to "data/mel_spectrograms"
Optional: To pass different input/output folders run "python source/data_prep.py [path to audio_folder] [path to output_folder]"

How to train a model

All samples used for training have to be of the SAME length and in the same folder (default: "data/chunked_audio")). Samples have to be either .mp3 or .wave .

Set desired config parameters in "source/config.py"
Run "python source/main.py [path to data_folder] [path to conditional input (i.e. spectrograms)]" to start training. Passing [path to data_folder] and [path to conditional input (i.e. spectrograms)] is optional.The default paths are "data/chunked_audio" and "data/mel_spectrograms"

Note

Sometimes, when using different audio datasets, the mel spectrograms generated by "source/data_prep.py" have different dimensions. Hence an error in regarding the shape of the ConvTranspose2D layers in SpectrogramConditioner is thrown. To train the model, kernel_size, stride, padding and output_padding of the ConvTranspose2D layers have to be adjusted. Check pytorch docs for more details how to calculate correct parameters: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

Download dataset from JKU CP student1 server

Make sure you have an account on the JKU CP student1 server
run "scp -r username@student1.cp.jku.at:../davidh/data/chunked_audio ./", to download the audio dataset folder to your current directory. Make sure to replace "username" with your username.
run "scp -r username@student1.cp.jku.at:../davidh/data/mel_spectrograms ./", to download the mel_spectrograms dataset folder to your current directory. Make sure to replace "username" with your username.
Move the downloaded folders to the "data" folder in the root directory of the project

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
data		data
output		output
source		source
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
logbook.txt		logbook.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffWave-reimplementation

Set up environment with dependencies

Prepare data

How to train a model

Note

Download dataset from JKU CP student1 server

About

Releases

Packages

Languages

kinggongzilla/DiffWave-reimplementation

Folders and files

Latest commit

History

Repository files navigation

DiffWave-reimplementation

Set up environment with dependencies

Prepare data

How to train a model

Note

Download dataset from JKU CP student1 server

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages