The set of assumptions built into a learning algorithm to help it generalize. CNNs have a 'strong' inductive bias for locality (pixels near each other matter); Transformers have a 'weak' bias, making them more data-hungry but flexible.
Fundamental concept in learning theory.
Why CNNs beat Transformers on small datasets.