Patch Embedding

What is Patch Embedding?

The process of slicing an image into a grid of small squares (e.g., 16x16 pixels) and flattening them into vectors, effectively 'tokenizing' the image for a Transformer.

Where did the term "Patch Embedding" come from?

ViT's equivalent of word tokenization.

How is "Patch Embedding" used today?

Standard preprocessing for Vision Transformers.

Related Terms

Vision Transformer (ViT)
Embedding