Use of temperature param #148

rishabhm12 · 2024-11-28T07:36:05Z

Hey folks,
Have we tested the use of temperature param? I don't see it in the loss function nor in the colbert's official implementation.

rishabhm12 · 2024-11-28T10:09:58Z

One reason I am trying to think is, multitoken similarity might produce scores that are significantly different for positive and negative items for a given query after few steps in training (positives having hire score). Dividing the logits by temperature might further skew the distribution, making probas after softmax very high for positive query-item pair, lowering the loss significantly thereby making the model not learn meaningful embeddings. The model would think that it's already producing meaningful embeddings and not end up learning efficiently

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of temperature param #148

Use of temperature param #148

Use of temperature param #148

Use of temperature param #148

Comments