Transformers for NLP

AI, ML & Data Science

Ash Pahwa

Instructor at Caltech

Transformer architecture was proposed by Google Brain in 2017 to process sequential data. Transformers can be used in Natural Language Processing (NLP) and Computer Vision applications.

Transformer architecture is based on the concept of ‘Self-Attention’. Transformers replaced the RNN/LSTM architecture. The major advantages of Transformer architecture are that they are fast and bi-directional. The input text is fed into this architecture in parallel which allows faster processing.

The leading Language models BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), are built upon Transformer architecture. BERT was proposed by Google and GPT-1/2/3 was proposed by OpenAI. BERT Language Model is included in Google Search Engine. HuggingFace web portal provides many popular Transformers in different flavors.

Transformer can be used for all Natural Language Processing (NLP) applications like sentiment analysis, translation, auto-completion, named entity recognition, automatic question- answering and many more. Transformers can also be used for generating artificial text, which is indistinguishable from text generated by humans.

This talk will briefly cover the theory of Transformers. Next it will focus on how to fine tune the standard Transformer library (downloaded from Hugging Face portal) for a specific application.