A self-supervised learning technique where a model learns representations by pulling positive pairs (e.g., two different crops of the same image) closer together in the embedding space and pushing negative pairs (e.g., different images) apart.
Popularized by frameworks like SimCLR (Google) and MoCo (Facebook AI Research).
The basis for powerful unsupervised vision models (like CLIP) and modern text embedding models.