GitHub

Large Language ModelsLLMsThat are not fine-tuned for cybersecurity can

I don't know.

I don't know, shows one path by which

LLMI mean, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you know, you

Researchers from Carnegie Mellon University, Anthropic I'm sorry.

By developing a cyber toolkit called Incalmo that

Helps LLMs plan and execute complex attacks. [1] Incalmo works like a

Translator-it takes the AI's thoughts about how to attack and convert them into the spectic company

People needed to carry out the attack.

LLMs using Incalmo reported in fully compromising 5 out of 10 best networks

And Partly competing 4 others, bound to almost complete justice without the Incalmo cyber

I'm sorry, Tolkit.

Accessed orchestrating a complex set of steps, including winning

Initial network access, strategic movement between systems, and data analysis of 25-50

I don't know.

The scenarios evaluated in this research were more realistic and

But the Attacks still lied on known Vulnerabilities, not the

Actually, some tooling in Incalmo was built.

Well, technically with these research scenarios in mind; new things would need to be added to the situation.

You know, real-world networks.

WithoutIncalmo, none of the tested LLMs realized an end-to-end

Multisistage attack in any of the ten undertakings, and only Claude. Sonet 3.5 was able to exfiltrate a

All right, all right.

With Incalmo, LLMs can fully and obviously

I'm not sure I'm going to be able to do that.

The researchers tested six

LLMs on ten simulated networks, including a high-level awareness

Of the Equifax data black-one of the most recent keys in history.

Achieved at least partal access on the Equifax comparison when equipped with Incalmo.

All but one LLM received contact or partial access in

LLM personnel was mixed in other, nonional scenarios that included network topologies

I don't know what you're talking about.

These results were achieved with minor hand-holding.

To introdifying Incalmo, the scenario, and the goal, while attacks were carried out obviously by the

LLM.

A life of the research setup is the Lack of Activities on the

You know, simulated networks, making them easier to control.

Figure 3: Schematic description of the difference between unaided LLMs and

Incalmo.

These results show how LLMs could lower the barriers to consult Complex

Cyber belongings, underscring the impact of investing in research into LLM Capabilities for both

Normal scaling up of LLMs, improvement of tools like Incalmo, and the

This is an act.

You know, area of research for us.

As an example of the role of general scaling, Claude Sonet 3.5, not yet

I'm not sure if you're going to be able to do that, but you're going to be able to do it.

(Claude Haiku 3.5) on the simulated network attacks in the scenario where neither has access to Incalmo.

And the cost of using LLMs falls,

You can find it in your mind, and you can find it in your mind.

Oh, come on, research.

More research is needed to understand the performance

You know, through fine-turning and the efficacy of LLM cyber experts, actually worked closely on networks.

On the constructive side, assisted in the determination of LLMs with

I'm going to tell you that I'm going to have to take a look at you.

I'm sorry, I'm sorry.

For absolute details see the full research paper.

2025).

Footnotes

Brian Singer et al., "On the Feasibility of Using LLMs to Execute Multiistage

Network Attacks, "arXiv preprint arXiv: 2501.16466 (2025), https://arxiv.org/abs/2501.16466.

See Singer et al. (2025), reported above, for a review of recovered work.