Training Data

What is Training Data?

Training data is the collection of examples (text, images, code, etc.) used to teach a machine learning model. The model learns patterns, associations, and logic from this data. The quality, diversity, and size of the training dataset are the most critical factors determining a model's performance and bias.

Where did the term "Training Data" come from?

Fundamental concept in supervised and self-supervised learning.

How is "Training Data" used today?

Massive datasets like Common Crawl and The Pile have enabled the rise of Large Language Models.

Related Terms