A Large Language Model (LLM) is a type of artificial intelligence trained on vast amounts of text data to understand, generate, and manipulate human language. Built on the Transformer architecture, LLMs learn statistical relationships between words (tokens) to predict the next word in a sequence. Their 'large' nature refers to both the massive size of their training datasets and the billions of parameters (weights) they contain, which allow them to exhibit emergent capabilities like reasoning, coding, and translation.
The modern era of LLMs began with the introduction of the Transformer architecture by Google in 2017 ('Attention Is All You Need') and scaled up with OpenAI's GPT series (2018+).
LLMs have revolutionized AI, powering widely used tools like ChatGPT, Claude, and GitHub Copilot, and transforming industries from software development to content creation.