OpenAI Launches Safety Bug Bounty

OpenAI launches a new Safety Bug Bounty program to identify AI abuse and safety risks beyond traditional security vulnerabilities.

2 min read
OpenAI logo with a shield icon indicating security
Image credit: OpenAI News

OpenAI is expanding its bug bounty efforts with a new program specifically targeting AI abuse and safety risks across its suite of products. This initiative, announced by OpenAI News, aims to proactively identify and mitigate potential misuse of its rapidly evolving AI technologies.

The OpenAI Safety Bug Bounty program will complement the existing security bounty by accepting reports on issues that could lead to tangible harm, even if they don't represent a traditional security vulnerability. This move acknowledges that as AI capabilities grow, so do the avenues for potential misuse, pushing the company to broaden its safety net.

Focus Areas for Safety Bugs

The program outlines several key areas of concern. These include 'Agentic Risks,' such as prompt injection and data exfiltration where an attacker could hijack an AI agent to perform harmful actions or leak sensitive user data. OpenAI specifies that such behaviors must be reproducible at least 50% of the time. Researchers are also encouraged to report instances where an agentic product performs disallowed actions on OpenAI's website at scale or engages in other potentially harmful actions, provided they demonstrate plausible and material harm. Any testing for these risks must adhere to third-party terms of service.

Another category covers 'OpenAI Proprietary Information,' including model generations that reveal internal reasoning or expose sensitive company data. 'Account and Platform Integrity' vulnerabilities are also in scope, encompassing issues like bypassing anti-automation controls or manipulating trust signals, though specific vulnerabilities like bypassing account restrictions should be reported to the Security Bug Bounty program.

While general content-policy bypasses without demonstrable safety impact, such as "jailbreaks" leading to rude language or easily searchable information, are out of scope, OpenAI notes that it periodically runs private campaigns for specific harm types. Researchers who identify flaws leading to direct user harm with actionable remediation steps may be considered for rewards on a case-by-case basis.

How to Participate

Interested researchers can apply to the Safety Bug Bounty program via OpenAI's dedicated portal. The company emphasizes its commitment to collaborating with the safety and security community to foster a more secure AI ecosystem. This initiative is a crucial step in addressing the complex challenges of AI governance, particularly concerning novel threats like prompt injection data exfiltration and broader AI abuse and safety risks, including those related to Agentic Risks MCP.