AI model hacking Archives

Anthropic Offers $15K to Jailbreak AI Safety System

Can you jailbreak Anthropic latest AI safety measure? R esearchers want you to try — and are offering up to $15,000 if you succeed. On Monday, the company released a new paper outlining an AI safety system based on Constitutional Classifiers. The process is based on Constitutional AI, a system Anthropic used to make Claude “harmless,” in which one AI […]

Tag: AI model hacking

Anthropic Offers $15K to Jailbreak AI Safety System