Hacker's Corner: Has Agentic AI Really Entered the Cyberhack Chat?

Heather Pennel
Dec 2, 2025
3 min read

The most significant cybersecurity hack of 2025 made headlines in the final quarter of the year. This attack appears to be the first of its kind as a state-sponsored AI Agent driven cyber espionage attack. Anthropic reported in November that in September 2025 their Claude AI code tool was manipulated to target 30 global organizations. It is speculated that this attack was orchestrated by a state-sponsored Chinese threat actor group, and claims that the attack used AI not only to advise but to execute the cyber attack itself.

Anthropic's Claude AI tool, named after Claude Shannon, is an advanced conversational AI and focuses on Anthropic's mission of safety and interpretability. It operates similarly to GPT models but prioritizes ethical alignment. Designed to be honest and helpful, it uses a unique training method called Constitutional AI to guide its behavior and minimize harmful outputs.

The alleged espionage attack in September 2025 was reported by Anthropic two months later, in November. They claimed that they had uncovered a highly sophisticated espionage attack likely performed by a Chinese state-sponsored group, most likely GTG-1002. Roughly 30 global organizations were targeted from a variety of markets that included tech, finance, chemical, and government industries. This very well could be the first report of an AI-orchestrated cyber attack; however, some tech experts are questioning the evidence and warning against the hype (BBC, Joe Tidy, 14 Nov 2025; Axios, 13 Nov 2025; Wired, Nov 2025).

Anthropic broke down the four phases of the attack (full report with models available here):

Phase 1: Target Selection & Jailbreaking - Humans chose targets and built an attack framework for autonomous compromise. They jailbroke Claude by breaking tasks into small, benign steps and pretending to be a legitimate cybersecurity firm conducting defensive tests.

Phase 2: Reconnaissance - Claude scanned systems, mapped infrastructure, and identified high-value databases - tasks completed in minutes instead of days.

Phase 3: Exploitation & Credential Harvesting - Claude wrote exploit code, tested vulnerabilities, harvested credentials, escalated privileges, and exfiltrated sensitive data.

Phase 4: Persistence & Documentation - Claude created backdoors and produced detailed

documentation of stolen credentials and analyzed systems for future operations.

Anthropic noted that the threat actor was able to use AI to perform 80-90% of the campaign, the rest needing human intervention sporadically. What makes this critical is how fast the attack was able to be carried out, as this would have taken extensive time for a human team. At its peak, the number of requests in the thousands and often multiple per second would have been impossible for human attackers to match.

Another note of interest is that Anthropic claims that Claude made mistakes, often "hallucinating" credentials or even claiming to have withdrawn secret information that was fully available to the public, which remains an obstacle to fully automated attacks. Anthropic said ,"This campaign has substantial implications for cybersecurity in the age of AI ‘agents’ - systems that can be run autonomously for long periods of time and that complete complex tasks largely independent of human intervention and Agents are valuable for everyday work and productivity - but in the wrong hands, they can substantially increase the viability of large-scale cyberattacks.”

Experts have mixed views. Ars Technica reported that outside researchers are skeptical of Anthropic’s claim that the attack was 90% autonomous. They argue that AI’s ability to automate complex chains of tasks remains limited and comparable to traditional hacking tools like Metasploit. Researchers question why attackers supposedly achieve results others cannot, and note that only a small number of attacks succeeded out of 30 targets. China has denied involvement, calling the claims unfounded. The fact that Anthropic also noted that Claude made mistakes, often hallucinating credentials or even claiming to have withdrawn secret information that was fully available to the public, also leads to speculation on the validity of the scale of the attack.

Cybersecurity is changing and evolving at a rapid pace. Security teams should experiment with AI for defense; invest in safeguards to prevent misuse and prepare for increasingly autonomous threats. The race between attackers and defenders is accelerating and AI is at the center.

Hacker's Corner: Has Agentic AI Really Entered the Cyberhack Chat?

Sources:

Recent Posts

Comments