Towards a Deeper Understanding of Neural Language Generation
T He - 2022 - dspace.mit.edu
2022•dspace.mit.edu
In recent years, the field of language modelling has witnessed exciting developments.
Especially, thanks to large-scale data, powerful model architectures, and high-speed
parallel computing devices, researchers are able to train language models which can
generate realistic text. However, our understanding of these powerful language models
remains shallow. What aspects of the language model are good, and what aspects need to
be improved? These will be the key questions behind this thesis. This thesis includes a set …
Especially, thanks to large-scale data, powerful model architectures, and high-speed
parallel computing devices, researchers are able to train language models which can
generate realistic text. However, our understanding of these powerful language models
remains shallow. What aspects of the language model are good, and what aspects need to
be improved? These will be the key questions behind this thesis. This thesis includes a set …
In recent years, the field of language modelling has witnessed exciting developments. Especially, thanks to large-scale data, powerful model architectures, and high-speed parallel computing devices, researchers are able to train language models which can generate realistic text. However, our understanding of these powerful language models remains shallow. What aspects of the language model are good, and what aspects need to be improved? These will be the key questions behind this thesis. This thesis includes a set of behavior analyses of language models (LMs) with a focus on generation. We will also propose methods to alleviate some of the identified problems. The four high-level topics are (1) The general sampling behavior of an auto-regressive LM. In particular, we will take a closer look at the popular sampling algorithms. (2) Whether the LM is vulnerable to adversarial attacks, and how to make it more robust. (3) The LM’s ability to remember knowledge learned from data, and relatedly, what’s the best way to expose this learned knowledge. (4) How to get more fine-grained control on the model’s generation.
dspace.mit.edu
Résultat de recherche le plus pertinent Voir tous les résultats