A custom DiffWave reimplementation for the Practical Work in AI course at JKU.
- Load conda env with "conda env create -f environment.yml"
- Activate conda environment with "conda activate practical_work"
- If on Mac/Linux install sox with "conda install -c conda-forge sox" (a dependency of torchaudio)
By default, when using source/data_prep.py, full length audio files are chunked into fixed length samples of 4 seconds (length is configurable). Mel spectrograms are computed for each chunked sample. Default input folder for data processing is "raw_samples" Default output folder for chunked audio is "data/chunked_audio" Default output folder for mel spectrograms is "data/mel_spectrograms"
- Set up a folder "raw_samples" in root directory, containing audio files (.mp3 or .wav) that should be used as training data
- Make sure that "data/chunked_audio" and "data/mel_spectrograms" folders exist. If not, create them.
- Run "python source/data_prep.py" to chop samples in "raw_samples" into 4 second snippets and save to "data/chunked_audio" and spectrograms to "data/mel_spectrograms"
- Optional: To pass different input/output folders run "python source/data_prep.py [path to audio_folder] [path to output_folder]"
All samples used for training have to be of the SAME length and in the same folder (default: "data/chunked_audio")). Samples have to be either .mp3 or .wave .
- Set desired config parameters in "source/config.py"
- Run "python source/main.py [path to data_folder] [path to conditional input (i.e. spectrograms)]" to start training. Passing [path to data_folder] and [path to conditional input (i.e. spectrograms)] is optional.The default paths are "data/chunked_audio" and "data/mel_spectrograms"
Sometimes, when using different audio datasets, the mel spectrograms generated by "source/data_prep.py" have different dimensions. Hence an error in regarding the shape of the ConvTranspose2D layers in SpectrogramConditioner is thrown. To train the model, kernel_size, stride, padding and output_padding of the ConvTranspose2D layers have to be adjusted. Check pytorch docs for more details how to calculate correct parameters: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html
- Make sure you have an account on the JKU CP student1 server
- run "scp -r username@student1.cp.jku.at:../davidh/data/chunked_audio ./", to download the audio dataset folder to your current directory. Make sure to replace "username" with your username.
- run "scp -r username@student1.cp.jku.at:../davidh/data/mel_spectrograms ./", to download the mel_spectrograms dataset folder to your current directory. Make sure to replace "username" with your username.
- Move the downloaded folders to the "data" folder in the root directory of the project