Inference

What is Inference?

In machine learning, inference is the process of using a trained model to make predictions on new, unseen data. After a model has been trained on a dataset to learn patterns and relationships, it enters the inference phase, where it applies this learned knowledge to generate outputs. This is the operational or 'production' stage of the machine learning lifecycle, where the model provides value by making decisions, classifying data, or generating content.

Where did the term "Inference" come from?

The distinction between training and inference (sometimes called 'prediction' or 'scoring') has been a fundamental part of the machine learning workflow since its early days. As models began to be deployed in real-world applications, it became necessary to have a term to describe the process of using a trained model to make predictions. The term 'inference' has been adopted to describe this process, emphasizing that the model is 'inferring' outputs from new inputs based on its training.

How is "Inference" used today?

Inference is performed in a wide variety of environments, from low-power edge devices like smartphones to large-scale cloud infrastructure. The choice of deployment environment depends on factors like latency requirements, data privacy concerns, and computational cost. As AI models have become larger and more complex, the cost of inference has become a significant factor in the overall cost of running AI-powered services. This has led to the development of specialized hardware (like TPUs and GPUs) and software (like ONNX and TensorRT) to optimize the performance and efficiency of inference.

Related Terms