Generative Pre-trained Transformer (GPT)

What is Generative Pre-trained Transformer (GPT)?

Generative Pre-trained Transformer (GPT) refers to a family of large language models developed by OpenAI. These models are built using the transformer architecture. The name itself describes their function: 'Generative' means they can create new, human-like text; 'Pre-trained' indicates they are trained on a vast corpus of unlabeled text data to learn grammar, facts, and reasoning abilities; and 'Transformer' refers to their underlying deep learning architecture that uses self-attention mechanisms.

Where did the term "Generative Pre-trained Transformer (GPT)" come from?

The first GPT model was introduced by OpenAI in their 2018 paper, 'Improving Language Understanding by Generative Pre-Training.' This paper demonstrated the effectiveness of a two-stage process: unsupervised pre-training on a diverse text corpus followed by supervised fine-tuning for specific tasks. This set the stage for subsequent, much larger models in the series.

How is "Generative Pre-trained Transformer (GPT)" used today?

The release of GPT-2, and especially GPT-3, marked a significant turning point in the public's perception and the capabilities of AI. However, it was the launch of ChatGPT (based on the GPT-3.5 and later GPT-4 architectures) in late 2022 that brought the technology into the global mainstream. The term 'GPT' has become almost synonymous with advanced conversational AI and has spurred a massive wave of investment and development in generative AI across the tech industry.

Related Terms