Jailbreaking

What is Jailbreaking?

Jailbreaking is the act of bypassing a model's safety guardrails (e.g., 'Do not create nsfw content') to generate prohibited outputs, often via clever prompting.

Where did the term "Jailbreaking" come from?

Cat-and-mouse game between users and developers.

How is "Jailbreaking" used today?

Major concern for AI safety and enterprise deployment.

Related Terms

safety
Alignment
Prompt Engineering