[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras rewrite: Attention loss goes up after 5 epochs? #67

Open
dimasikson opened this issue Feb 15, 2020 · 0 comments
Open

Keras rewrite: Attention loss goes up after 5 epochs? #67

dimasikson opened this issue Feb 15, 2020 · 0 comments

Comments

@dimasikson
Copy link
dimasikson commented Feb 15, 2020

Hi, I attempted to re-write this repo in Keras to migrate it to TF 2.0. In short, I need some help in terms of the training process. My attention loss goes up over time, which reflects on the quality of the Mel output. In the output, the attention line is scattered.

Here is the repo: https://github.com/dimasikson/dc_tts_keras

In the Text2Mel model, my attention loss goes up after 4-7 epochs, depending on the hyperparams.

Now the batch size in my model is 8 due to my GPU not being able to fit 32 in one go, but I did try the original model on B=4 and it was totally fine after 20 epochs. I doubt this is to do with Batch size.

Here was the attention loss (moving average) with 'vanilla' hyperparams, or exactly as found in the original repo (except for the Batch size as mentioned earlier). 1638 steps per epoch, 15 epochs, 2500 step 2-sided moving average chart.

image

Here is after I randomized the batch order between epochs AND increased the LR decay in the 'utils' file. 8 epochs, same moving average.

image

The increased decay makes the effect appear later in the training, but it still fairly deterministically goes up.

In the grand scheme of things, the overall loss goes down just fine, but this attention loss kind of screws up the output. Here is the total loss after 8 epochs (2nd model):

image

What results is an output like this (2nd model, 8 epochs). Below is the attention plot in the synthesis stage. Purposefully turned off mono attention for the sake of the visual.

image

And below are epochs 3,4,5 from the 1st model, which is roughly where it screws up.

Epoch 3:
image

Epoch 4:
image

Epoch 5:
image

What I would like to understand:

  • Did I copy the model 1 to 1?
  • What can I try to fix this?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant