Zach Anderson
Apr 10, 2026 23:18
Anthropic releases security guidelines as Project Glasswing reveals frontier AI models can now find and exploit vulnerabilities faster than human defenders.
Anthropic dropped a sobering assessment this week: within two years, AI models will uncover vast numbers of software vulnerabilities that have sat unnoticed in code for years—and chain them into working exploits. The company’s security teams released detailed defensive recommendations alongside Project Glasswing, their initiative to deploy Claude Mythos Preview’s capabilities for cyber defense.
The math here isn’t complicated. If attackers can use frontier models to automate vulnerability discovery and exploit generation, the window between a patch dropping and a working exploit appearing shrinks dramatically. Anthropic’s security engineers have watched this happen in their own testing.
What Their Research Actually Found
According to Anthropic’s technical findings, AI models excel at recognizing signatures of known vulnerabilities in unpatched systems. Reversing a patch into a working exploit—exactly the kind of mechanical analysis these models handle well—used to require specialized skills. Now it’s becoming automated.
The company noted that publicly available models below Mythos capability levels can already find serious vulnerabilities that traditional code reviews missed for extended periods. Mozilla Firefox vulnerabilities discovered through AI scanning serve as one documented example.
The Defensive Playbook
Anthropic’s recommendations prioritize controls that hold even against attackers with unlimited patience and AI assistance. Friction-based security measures—extra pivot hops, rate limits, non-standard ports—lose effectiveness when adversaries can grind through tedious steps automatically.
Their top priorities:
Patch velocity matters more than ever. Internet-facing applications should receive patches within 24 hours of an exploit becoming available. The CISA Known Exploited Vulnerabilities catalog should be treated as an emergency queue. Anthropic recommends using EPSS (Exploit Prediction Scoring System) for prioritizing everything else.
Prepare for 10x vulnerability report volume. Over the next two years, intake and triage processes will face pressure they’ve never experienced. Organizations still running weekly spreadsheet meetings won’t keep pace.
Scan your own code with frontier models before attackers do. This was Anthropic’s single most emphasized recommendation. Legacy code that predates current review practices—especially code whose original authors have moved on—represents the highest-value target for proactive scanning.
Zero Trust Gets Real
The guidance pushes hard toward hardware-bound credentials and identity-based service isolation. A compromised build server shouldn’t reach production databases. A compromised laptop shouldn’t touch build infrastructure.
Static API keys, embedded credentials, and shared service-account passwords are described as “among the first things an attacker with model-assisted code analysis will find.”
For Smaller Operations
Organizations without dedicated security teams got specific advice: enable automatic updates everywhere, prefer managed services over self-hosting, use passkeys or hardware security keys, and turn on free security tooling from code hosts like GitHub’s Dependabot and CodeQL.
Open-source maintainers should expect increased vulnerability report volume—some valuable, some automated noise. Publishing a SECURITY.md with clear intake processes helps separate signal from spam.
Anthropic committed to updating this guidance as Project Glasswing progresses. For enterprises tracking SOC 2 and ISO 27001 compliance, most recommendations map directly to existing controls. The difference now is urgency.
Image source: Shutterstock

