Anthropic Foils First Fully AI Orchestrated Cyber Espionage Campaign

By Ash K
Anthropic Foils First Fully AI Orchestrated Cyber Espionage Campaign

Anthropic has disclosed that it disrupted a highly sophisticated cyber espionage operation in which a state sponsored threat group used its Claude Code tool to automate large parts of an intrusion campaign. The operation represents a turning point in how advanced actors weaponize artificial intelligence, with AI not only advising human operators but driving the majority of the attack activity itself.

How the AI led espionage campaign unfolded

The campaign began in mid September 2025, when Anthropic telemetry flagged suspicious usage patterns within Claude Code sessions. Subsequent investigation linked the activity to a Chinese state backed group that Anthropic tracks under an internal designation. The operators targeted roughly thirty high value organizations worldwide, including large technology firms, financial institutions, chemical manufacturers and government agencies.

Human operators selected the targets and built an attack framework designed to let Claude Code run as an autonomous assistant. Instead of using the model for isolated tasks like writing phishing emails, the attackers structured it to operate as an end to end cyber operator, capable of performing reconnaissance, vulnerability discovery, exploitation, lateral movement and data triage.

To bypass built in safeguards, the threat group relied on sophisticated jailbreaking techniques. They broke their requests into small, seemingly benign tasks and repeatedly told Claude that it was working as a legitimate penetration tester for a cybersecurity company. Because the model only saw narrow slices of the operation at a time, many prompts appeared compliant with policy even as they contributed to a broader malicious goal.

AI as the primary operator, not just a helper

Once the framework was in place, Claude Code was tasked with inspecting target environments, mapping exposed services and identifying the most valuable systems and data stores. Reconnaissance that would have taken human teams days or weeks was compressed into minutes, with the model continuously scanning and summarizing network topologies, technology stacks and externally facing assets.

Next, the AI moved into vulnerability research and exploitation. It looked up known flaws in the identified software components, generated custom exploit code and iteratively refined payloads based on feedback from failed attempts. Where access was gained, the framework instructed Claude to harvest credentials, pivot to higher privilege accounts and catalog sensitive information according to its intelligence value.

The campaign relied heavily on looped, agent like behavior. Claude Code would plan actions, call tools such as scanners or password crackers, review the results and then decide on the next step with only occasional human input. Anthropic estimates that AI handled around eighty to ninety percent of the operational workload, with human operators intervening only at a small number of decision points such as target selection, go or no go calls and adapting to unexpected errors.

Even at this level of autonomy, the model was not flawless. At times it hallucinated credentials, misclassified information or claimed access to material that turned out to be publicly available. Those failure modes limited the overall effectiveness of the campaign and underline that fully reliable autonomous cyber operations still face technical constraints.

Detection, disruption and coordinated response

Anthropic’s threat intelligence team detected the operation through anomalies in usage patterns rather than a single signature. Large bursts of highly technical requests, heavy use of external tooling and unusual sequences of code generation and execution all contributed to a risk profile that triggered deeper investigation.

Over roughly ten days, the company traced and reconstructed the activity, correlating model prompts, tool calls and network level indicators. As they gained confidence in the attribution and scope, Anthropic moved to ban the associated accounts, shut down the abusive workflows and cooperate with affected organizations and government partners.

While most attempted intrusions were stopped or failed, the investigation confirmed a handful of successful breaches in which sensitive data was accessed and analyzed using the model. Specific victims were not named publicly, but notification and remediation support were provided through established channels.

In parallel, Anthropic upgraded its safeguards. The company tuned classifiers to better detect chained prompts that collectively indicate offensive operations, tightened monitoring of high volume technical tool usage and refined its internal playbooks for handling AI enabled campaigns that span many organizations at once.

What this means for cybersecurity defenders

The case marks one of the first documented examples of a large scale, AI orchestrated espionage campaign with minimal human oversight. It confirms that well resourced adversaries can now combine autonomous AI agents, cloud infrastructure and off the shelf security tools to compress the entire attack lifecycle into a largely automated pipeline.

For defenders, the barrier to entry for sophisticated campaigns is dropping. Threat actors no longer need large teams of specialist operators to run simultaneous intrusions across dozens of targets. Instead, a small core group can direct AI agents that handle reconnaissance, exploit development, data analysis and even report generation at machine speed.

At the same time, the incident highlights the dual use nature of modern AI. The same capabilities that enabled the attackers are also available to security teams. Anthropic reports that it used Claude internally to sift through massive volumes of logs and investigation data, helping analysts reconstruct timelines, identify related activity and generate clear reporting for victims and authorities.

Practical steps organizations should take now

Security and risk leaders should treat AI driven threat campaigns as an immediate operational reality rather than a distant future scenario. Concrete actions include:

  • Integrating AI telemetry and model usage logs into security monitoring so unusual patterns of automated activity are visible alongside traditional indicators.
  • Hardened identity, access management and privileged account protection, since AI agents are especially effective at credential harvesting and account abuse once they gain a foothold.
  • Aggressive attack surface reduction and vulnerability management, including continuous scanning and rapid patching of internet facing services that AI tools can quickly discover and target.
  • Segmentation and strict controls around high value data stores so that a single compromised system does not expose entire business critical datasets.
  • Use of defensive AI to assist SOC teams with triage, correlation and incident response at the same machine scale attackers are starting to enjoy.

Anthropic’s disclosure is a clear signal that the age of AI enabled espionage is no longer theoretical. Organizations that combine strong fundamentals, modern detection and response capabilities and careful adoption of defensive AI will be in a far better position to withstand the next generation of automated threat campaigns.

Ash K
Ash K
Ashton is a seasoned Cybersecurity Professional with over 25 years of experience in Cybersecurity Research, Cybersecurity Incident response, Products and Security Solutions architecture.