Darius Baruo
Mar 18, 2026 17:55
OpenAI explains why Codex Safety makes use of AI constraint reasoning as an alternative of conventional static evaluation, aiming to chop false positives in code safety scanning.
OpenAI has printed a technical deep-dive explaining why its Codex Safety device intentionally avoids conventional static utility safety testing (SAST), as an alternative utilizing AI-driven constraint reasoning to search out vulnerabilities that standard scanners miss.
The March 17, 2026 weblog put up arrives because the SAST market—valued at $554 million in 2025 and projected to hit $1.5 billion by 2030—faces rising questions on its effectiveness in opposition to refined assault vectors.
The Core Drawback with Conventional SAST
OpenAI’s argument facilities on a elementary limitation: SAST instruments excel at monitoring information move from untrusted inputs to delicate outputs, however they battle to find out whether or not safety checks really work.
“There is a large distinction between ‘the code calls a sanitizer’ and ‘the system is secure,'” the corporate wrote.
The put up cites CVE-2024-29041, an Categorical.js open redirect vulnerability, as a real-world instance. Conventional SAST may hint the dataflow simply sufficient. The precise bug? Malformed URLs bypassed allowlist implementations as a result of validation ran earlier than URL decoding—a delicate ordering drawback that source-to-sink evaluation could not catch.
How Codex Safety Works Otherwise
Somewhat than importing a SAST report and triaging findings, Codex Safety begins from the repository itself—inspecting structure, belief boundaries, and meant habits earlier than validating what it finds.
The system employs a number of strategies:
Full repository context evaluation, studying code paths the way in which a human safety researcher would. The AI would not mechanically belief feedback—including “//this isn’t a bug” above susceptible code will not idiot it.
Micro-fuzzer technology for remoted code slices, testing transformation pipelines round single inputs.
Constraint reasoning throughout transformations utilizing z3-solver when wanted, significantly helpful for integer overflow bugs on non-standard architectures.
Sandboxed execution to differentiate “could possibly be an issue” from “is an issue” with precise proof-of-concept exploits.
Why Not Use Each?
OpenAI addressed the apparent query: why not seed the AI with SAST findings and cause deeper from there?
Three failure modes, in response to the corporate. First, untimely narrowing—a SAST report biases the system towards areas already examined, doubtlessly lacking complete bug lessons. Second, implicit assumptions about sanitization and belief boundaries which might be laborious to unwind when improper. Third, analysis problem—separating what the agent found independently from what it inherited makes measuring enchancment almost unimaginable.
Aggressive Panorama Heating Up
The announcement comes amid intensifying competitors in AI-powered code safety. Simply sooner or later later, on March 18, Korean safety agency Theori launched Xint Code, its personal AI platform focusing on vulnerability detection in massive codebases. The timing suggests a race to outline how AI transforms utility safety.
OpenAI was cautious to not dismiss SAST fully. “SAST instruments may be glorious at what they’re designed for: imposing safe coding requirements, catching easy source-to-sink points, and detecting identified patterns at scale,” the put up acknowledged.
However for locating the bugs that value safety groups essentially the most time—workflow bypasses, authorization gaps, state-related vulnerabilities—OpenAI is betting that beginning recent with AI reasoning beats constructing on prime of conventional tooling.
Documentation for Codex Safety is accessible at builders.openai.com/codex/safety/.
Picture supply: Shutterstock

