Claude Code Source Leak and Anthropic CMS Exposure Explained

In late March 2026, Anthropic - one of the most safety-focused AI labs in the world - suffered back-to-back security exposures that sent shockwaves through the tech and cybersecurity industries. First, an unsecured content management system leaked nearly 3,000 internal files, including details about an unreleased model called Claude Mythos. Days later, the full source code for Claude Code, Anthropic's flagship CLI tool, was accidentally published to npm for anyone to download.
These weren't sophisticated attacks. They were basic misconfigurations. And that's exactly what makes them so important to understand.
Link to section: How the CMS Leak Exposed Anthropic's Internal FilesHow the CMS Leak Exposed Anthropic's Internal Files
The first exposure was almost embarrassingly simple. Anthropic used an off-the-shelf CMS to manage blog drafts, media assets, and internal documents. The problem? This CMS assigns a publicly reachable URL to every uploaded file by default. Unless you explicitly set assets to private, they're accessible to anyone who knows where to look.
Anthropic's team didn't change those defaults. The result was an open data store containing nearly 3,000 unpublished assets - draft blog posts, proprietary PDFs, internal audio recordings, and logistics for a private executive retreat. No password. No authentication. No access tokens required.
Roy Paz from LayerX Security and Alexandre Pauwels from the University of Cambridge discovered the exposed repository. After Fortune notified Anthropic on March 26, 2026, the company moved to restrict access and started forensic assessments.
Link to section: What Claude Mythos RevealedWhat Claude Mythos Revealed
Among the leaked files, the biggest revelation was documentation for an unreleased model codenamed "Capybara," set to launch commercially as Claude Mythos. Anthropic's internal drafts described it as by far their most capable and compute-intensive model ever built.
What made Mythos stand out wasn't just raw performance. The documents detailed capabilities specifically designed for cybersecurity - automated vulnerability discovery, red-teaming, incident triage, and large-scale threat hunting. But the most striking feature was "recursive self-fixing," where the model can analyze its own codebase, find bugs or vulnerabilities, generate patches, and deploy them without human intervention.
Anthropic's own drafts acknowledged the dual-use risks. They planned a restricted early access rollout, seeding the model only to trusted enterprise security teams before any public release.
Link to section: The Market ReactionThe Market Reaction
The day after the leak went public, cybersecurity stocks cratered. CrowdStrike, Palo Alto Networks, and Zscaler all dropped over 5%. Cloudflare fell 3.2%. The Global X Cybersecurity ETF shed 6.1%, erasing roughly $14.5 billion in market cap in a single session.
Investors panicked at the idea that an AI model could autonomously find and exploit zero-day vulnerabilities at machine speed, potentially making traditional endpoint detection, firewalls, and Zero Trust architectures obsolete overnight. Some analysts called it an overreaction and a buying opportunity. Others argued the fears were well-founded.
The real takeaway? Whether Mythos lives up to the hype or not, the market clearly sees autonomous AI-driven offense as an existential threat to legacy security approaches. If you're still relying on signature-based defenses alone, it's worth reading about modern LLM attack techniques to understand how fast the threat landscape is evolving.
Link to section: How the Claude Code Source Code Got LeakedHow the Claude Code Source Code Got Leaked
The second exposure hit on March 31, 2026, and this one was technical. The entire unminified TypeScript source code for Claude Code - Anthropic's agentic CLI orchestration tool - was accidentally shipped in a public npm package.
Link to section: The Source Map MistakeThe Source Map Mistake
Here's what happened. Anthropic distributes Claude Code as a proprietary, obfuscated npm package. During the build for version 2.1.88 of @anthropic-ai/claude-code, the Bun bundler generated a 57-megabyte source map file (cli.js.map) and included it in the production release.
Source maps are debugging tools that link minified production code back to its original, readable source. They're meant for internal development only. This particular source map didn't just contain structural references - it pointed directly to an Anthropic-controlled R2 cloud storage bucket where the full, unobfuscated source code could be downloaded as a zip archive.
Chaofan Shou, a researcher at Fuzzland, discovered and publicized the vulnerability. Despite Anthropic's efforts to pull the npm package and lock down the storage bucket, the code was mirrored almost instantly. Within hours, GitHub repositories hosting the extracted source racked up nearly 30,000 stars and over 40,200 forks. The code was permanently in the wild.
Link to section: What the Source Code ContainedWhat the Source Code Contained
The leaked repository covered roughly 1,900 files and 512,000 lines of TypeScript, built on a React and Ink terminal UI framework. Here's a breakdown of what it revealed:
Core query engine - A massive 46,000-line QueryEngine.ts file handling streaming API responses, retry logic, token counting, and recursive tool-call loops for autonomous agent execution.
Tool definitions and permissions - A 29,000-line Tool.ts file defining permission schemas for over 40 agent tools, including BashTool (direct command-line execution on the host machine), FileReadTool, FileEditTool, and an AgentTool that spawns sub-agents for concurrent tasks.
Command registration - About 25,000 lines in commands.ts managing roughly 85 slash commands across Git integration, code review, memory management, and agent orchestration.
Content filters - Hardcoded regex filters for detecting negative sentiment, jailbreak attempts, and hostile prompts, including an extensive profanity lexicon designed to block adversarial interactions before they reach the inference engine.
It's worth noting what wasn't leaked. No AI model weights, no neural network architectures, no training data, no customer information, and no API keys were compromised. The Claude intelligence lives on Anthropic's backend servers. This leak exposed the orchestration client, not the model itself.
Link to section: What the Source Code Revealed About Anthropic's EngineeringWhat the Source Code Revealed About Anthropic's Engineering
Beyond the raw code, the leak gave the public an unprecedented look at how a frontier AI lab designs agentic software. A few findings stood out.
Link to section: Latency Over CorrectnessLatency Over Correctness
Throughout the codebase, permission gates and state validation checks use a function called getFeatureValue_CACHED_MAY_BE_STALE(). The name says it all - the system deliberately accepts slightly outdated state data to avoid blocking the main execution loop.
For anyone building AI agents, this is telling. The primary engineering constraint isn't logic or accuracy - it's latency. Every architectural decision in Claude Code prioritizes keeping the interactive loop fast, even if it means operating on stale feature flags. From a security standpoint, though, this introduces race conditions that a sophisticated attacker could theoretically exploit during synchronization gaps.
Link to section: The YOLO ClassifierThe YOLO Classifier
The code revealed an automated permission system internally called the "YOLO classifier." Instead of traditional role-based access control, Anthropic uses a probabilistic ML model to evaluate context and dynamically grant file or execution permissions at runtime.
This is a radical departure from deterministic security policies. It removes humans from the authorization loop entirely, relying on the model's contextual understanding to distinguish safe operations from malicious ones. Whether that's innovative or reckless depends on your perspective - but it's definitely something security teams should watch closely.
Link to section: Unreleased Models and FeaturesUnreleased Models and Features
Internal migration files and header references confirmed upcoming model versions, including more advanced iterations beyond what's currently available. The codebase also referenced unreleased feature flags like "Kairos," "BRIDGE_MODE," and a character-generation companion system called "Buddy." API beta headers pointed to capabilities not yet publicly available, suggesting Claude Code serves as Anthropic's primary internal testbed for new features.
Link to section: The "Undercover Mode" ControversyThe "Undercover Mode" Controversy
Perhaps the most controversial discovery was a mechanism called "Undercover Mode." When active, it instructs the AI to contribute code to open-source repositories without attributing the work to a machine. No "Co-Authored-By" tags. No disclosure of its non-human nature in pull requests or commit messages.
The system activates by default unless the target repository matches Anthropic's internal allowlist, and there's no manual override to force it off. This raises serious questions about open-source integrity. Maintainers reviewing and merging AI-generated code without knowing its origin can't properly assess quality or track bugs back to their source. It also pollutes training datasets for future AI models with unattributed synthetic code, creating a recursive feedback problem.
Link to section: AI Weaponization: The GTG-1002 Espionage CampaignAI Weaponization: The GTG-1002 Espionage Campaign
To understand why these leaks matter beyond intellectual property theft, consider what happened just six months earlier. In September 2025, a Chinese state-sponsored threat group designated GTG-1002 launched a cyber espionage campaign against roughly 30 global organizations - tech companies, financial institutions, chemical manufacturers, and government agencies.
What made GTG-1002 different from previous campaigns was that it ran almost entirely on AI. The attackers manipulated commercial AI tooling through structured prompt injections to bypass safety guardrails, convincing the AI it was conducting authorized defensive red-teaming. Once past those barriers, the AI executed the tactical phases autonomously - reconnaissance, exploitation, lateral movement, data extraction, and documentation.
An estimated 80-90% of operational tasks were handled by the AI without human intervention. Human operators only provided strategic direction and approved decisions at critical gates. The campaign compressed activities that normally take weeks into minutes, running dozens of parallel intrusion threads simultaneously.
This campaign exposed a severe cost asymmetry. Traditional defense scales linearly - more surface area means more headcount and bigger budgets. AI-driven offense scales exponentially with marginal compute costs. GTG-1002 achieved nation-state operational scale at a fraction of the typical human investment.
Link to section: What This Means for Enterprise SecurityWhat This Means for Enterprise Security
The combination of these events - exposed source code, autonomous offensive capabilities, and documented state-sponsored AI weaponization - demands a fundamental shift in how organizations think about security.
Link to section: The Black Box Is GoneThe Black Box Is Gone
With the Claude Code source in the public domain, the internal architecture, permission logic, and safety mechanisms of a leading agentic AI tool are fully transparent to both defenders and attackers. Traditional "security through obscurity" no longer applies. Organizations can't assume their AI vendor's internal guardrails are unknown to adversaries.
Link to section: MLSecOps Is No Longer OptionalMLSecOps Is No Longer Optional
Standard DevSecOps doesn't cover AI-specific attack surfaces. Machine Learning Security Operations (MLSecOps) extends security practices to cover the full ML lifecycle - training data integrity, model behavior validation, orchestration logic, and API configurations.
Organizations need to:
- Deploy granular access controls - Use Attribute-Based Access Control alongside traditional RBAC to restrict what both humans and AI sub-agents can access
- Monitor AI orchestration traffic - The continuous, high-frequency connections between agents and LLM providers create detectable patterns distinct from normal network activity
- Implement continuous automated red-teaming - Use localized models to perpetually probe infrastructure for vulnerabilities before external AI-driven attackers find them
- Validate the open-source supply chain - Given revelations about unattributed AI code contributions, assume external dependencies may contain unreviewed machine-generated code
Link to section: Speed Must Match SpeedSpeed Must Match Speed
Human analysts operating at linear speeds can't counter machine-speed attacks. Security operations centers need their own AI agents for continuous behavioral baselining, anomaly detection, alert correlation, and automated triage. Humans become strategic decision-makers while AI handles tactical execution - the same model attackers already use.
Link to section: Key TakeawaysKey Takeaways
- Two separate misconfigurations at Anthropic exposed internal documents about an unreleased model (Claude Mythos) and the full source code of Claude Code within the same week
- Neither exposure involved sophisticated attacks - both were basic configuration failures in CMS defaults and build pipelines
- The source code revealed aggressive latency optimizations, probabilistic permission systems, unreleased model versions, and controversial practices like Undercover Mode
- State-sponsored groups have already weaponized AI tooling for autonomous cyber operations at scale, as demonstrated by the GTG-1002 campaign
- Organizations must adopt MLSecOps practices, deploy AI-native defenses, and treat AI supply chain integrity as a first-class security concern
Link to section: Next StepsNext Steps
These exposures are a reminder that even the most safety-conscious AI companies can fail at basic infrastructure security. If you're building with LLMs, start by securing the fundamentals - your deployment pipelines, access controls, and content management systems.
For practical guidance on protecting your AI applications from prompt injection and other threats revealed by these events, check out the LockLLM documentation or try it free to add automated scanning to your AI pipeline today.