Jailbreaking is the act of bypassing a model's safety guardrails (e.g., 'Do not create nsfw content') to generate prohibited outputs, often via clever prompting.
Cat-and-mouse game between users and developers.
Major concern for AI safety and enterprise deployment.