You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One reason I am trying to think is, multitoken similarity might produce scores that are significantly different for positive and negative items for a given query after few steps in training (positives having hire score). Dividing the logits by temperature might further skew the distribution, making probas after softmax very high for positive query-item pair, lowering the loss significantly thereby making the model not learn meaningful embeddings. The model would think that it's already producing meaningful embeddings and not end up learning efficiently
Hey folks,
Have we tested the use of temperature param? I don't see it in the loss function nor in the colbert's official implementation.
The text was updated successfully, but these errors were encountered: