Latency is the delay between a user's action and the time it takes for the system to respond. In the context of large language models, it is the time it takes for the model to generate a response after receiving a prompt.
A core concept in computer science and networking.
A key metric for evaluating the performance of AI systems.