The world of cybersecurity is undergoing a quiet revolution, and it's all thanks to the rapid advancements in artificial intelligence and machine learning. Researchers at the UK AI Security Institute (AISI) have been keeping a close eye on the progress of large language models (LLMs) and their ability to tackle cybersecurity tasks. Their findings are both intriguing and potentially concerning for the future of the cybersecurity industry.
The Benchmarking Race
AISI has developed a unique "time window benchmark for cybersecurity," which essentially measures how much work an AI can do compared to a human in a given time frame. The results are eye-opening. For instance, Claude Sonnet 4.5 can complete tasks equivalent to those of a human cybersecurity expert in just 16 minutes, achieving an impressive 80% success rate with a token limit of 2.5 million. This is a significant improvement from previous estimates, and it's happening at an astonishing pace.
The human-comparable task time, as measured by AISI, is shrinking rapidly. If we consider the potential for unlimited token flow, AI models could potentially outperform humans even further. This rapid progress has led AISI to adjust its expectations. In February 2026, they reduced the estimated doubling period for AI task times from 8 to 4.7 months, and now, with the release of Anthropic Mythos Preview and OpenAI GPT-5.5, they've had to reassess again.
The Accelerating Doubling Time
The latest models have outpaced the previous trend, leading AISI to conclude that the doubling time for AI cyber capabilities is even shorter than initially thought. While AISI doesn't provide a specific value, they reference METR's software engineering measurements, which suggest a consistent doubling time of 4.2 months since late 2024. With the latest Mythos Preview checkpoint, this time is closer to 4 months.
It's important to note that this benchmark is specific to cybersecurity tasks and not a comprehensive assessment of AI capabilities. AI models are not becoming twice as capable in all areas; they are simply excelling in the time it takes to complete security-related tasks.
Real-World Implications
The implications of these advancements are far-reaching. AI models are now capable of completing complex tasks, such as solving a 32-step simulated corporate network attack and a previously unsolved seven-step industrial control system attack. When compared to Opus 4.6, the latest models show significant improvements in these areas.
However, the real-world impact remains uncertain. The curl project's experience with Mythos, which found only one confirmed vulnerability in its codebase, serves as a reminder that AI's capabilities in bug hunting and security are still evolving. As AI continues to advance, it will be crucial to ensure that these tools are used responsibly and ethically to enhance, not replace, human expertise.