Is AI ready to perform penetration testing?

Introduction

The AI gold rush is currently in play and we're seeing everyone attempt to jump on the band wagon and create the next greatest thing, cyber security is not excluded from this and we are seeing a rise of "AI penetration testing and Red Teaming" tools pop up, but are we really ready to let AI wild with exploits in our production environments?

In this post, I'm going to talk about some of the things AI can do well, others it cannot, and why I believe we are not yet ready for full blown penetration testing tooling. I'll also add some insight into how I believe a penetration testing tool could be built if you're going to do so, and no, it doesn't remove humans entirely...

With our knowledge of LLM's as they currently are, there are reservations as to whether or not LLM's can be trusted to perform end-to-end penetration testing, however, that does not mean they are useless in our field as they can be used as reasoning assistants inside tightly controlled frameworks to assist us with penetration testing. Lets take a look at a couple of the core concepts we as penetration testers rely on:

Can an LLM follow methodologies and strict scopes?

In short, yes. LLM's are actually quite good at this and this is probably the strongest point in this post, however, there are some caveats that need to be considered....

Things LLM's are good at;

Following structured playbooks
Enforcing written rules when repeatedly reminded
Mapping actions to an allowed scope
Refusing actions when constraints are made clear (as long as the model is not jailbroken)
Producing auditing logs of reasoned steps

Things LLM's are bad at;

Understanding the real-world impact beyond what we provide them
Respecting scope unless its technically enforced
Making safe decisions in ambiguous situations
Knowing if the target is that of a similar target, but is infact out-of-scope

There are some pros here if we wanted to build full blown tooling that could test end-to-end, but the cons far outweigh those. Respecting scope, making safe decisions so we don't bring down prod, and understanding impact is something we are relied upon to respect as penetration testers. Right now, LLM's cannot do this.

Other issues we currently have with LLM's that would be bad for end-to-end penetration testing tooling

If you have an understanding of how LLM's work and how to test them (be it simple prompt injection, or full blown ecosystem testing) then some of the below points will resonate here, and provide further reasons as to why we should not let LLM's conduct full scale penetration tests yet without human intervention.

- They hallucinate on technical details

Say a response is ambiguous, they may invent theories or examples:

non-existent parameters
incorrect exploit chains
wrong assumptions about the application or the backend logic

We've seen this many times, not only in our use of LLM enabled environments to assist in testing, but also in bug bounty submissions (which we are starting to see more AI slop submissions than ever before).

- They lack any type of ground truth

Ground truth is often considered the "gold standard" or "answer keys" for a model; without it, evaluating model performance becomes a catch-22 due to the difficulty to determine if an AI prediction is accurate or merely some dangerous guesswork. They can be overly confident, even when wrong!

Because of these two issues, halucination and lack of any ground truth, generally an AI model is therefore not capable of:

Being able to truly validate exploitability
Confirming any type of timing attacks
Detection of race conditions reliably
Able to accurately judge the impact of an exploit

Akin to the above, there are also further issues with LLM's use of over-suggestion, like the following examples:

over-suggesting attack paths (or simply creating ones that don't exist)
unrealistic attack chains
hypothetical issues that don't reflect the nature of the application or the architecture

Due to the above issues that we experience with LLM's, we cannot rely on prompt-based scoping or testing alone. The use of LLM's for triage or hypothesis generation could aid us in our testing, but end-to-end testing with these issues, I would not let anywhere near our production environment unmanaged.

Does that count me out for the use of AI in Penetration testing?

No. I believe that used correctly we can utilise LLM's to speed up our testing process and assist as reasoning agents. Lets look at analysing web requests for example, there are some things that LLM's can assist with here:

In our testing of using LLM's as reasoning agents, we have found that they are quite good at pattern based recognition:

SQL injection indicators
IDOR patterns
Showing Broken auth flows
Detecting Access control inconsistencies
Finding SSRF primitives
Finding Deserialisation patterns
Adding context to potential XSS reflections

We have also used them for multi-reasoning and hypothesis generation, such as:

Comparing HTTP responses
Using them to notice subtle differences
Identiying potential privilege escalation patterns
Suggesting attack chains for investigation
Hypothesising over JWT validation
What might happen if parameter X is changed

We have also used LLM's for other areas, such as report generation and within attack surface monitoring, but in none of this testing we have allowed direct exploit automation or unrestricted probing.

Could we actually achieve a useable concept we could place some trust in?

Yes and no. We certainly wouldn't place any trust currently in a fully autonomous penetration testing architecture, however if it was structured correctly, with decent guardrails and used as a reasoning co-pilot that understands methodology and scope, then it could be powerful. Here's how I would look to build something like this.

I'm going to touch on this subject lightly (otherwise we will be here for hours). If I were to build this there would be some core design principals that need to be in play:

The LLM should never directly control the networking or tooling
The LLM should only reason, propose and analyse
A hardened control layer should be present to enforce scope, policy and safety

For the workflow, it may look something like this:

User (Pentester)
        ↓
Orchestrator / Controller
        ↓
Policy & Scope Enforcement Layer
        ↓
Tool Execution Layer  (Scanners, Browsers, APIs, Proxies)
        ↓
Target Systems (In-Scope Assets Only)

Cognitive Layer (In paralell):
LLM Reasoning Engine + Memory + Vulnerability Analyzer

Human control layer

This is where we would define the engagement scope, align any methodologies, approve actions, override agent decisions and review logs

The orchestrator

This would be some form of deterministic controller that maintains state, manages agent loops, stores session contexts, handles tool routing and enforces workflows

Policy and Scope enforcement layer

This layer is non-LLM, deterministic and tamper resistant, it would perform scope enforcement provided by the human control layer, apply restrictions to any LLM's in the workflow and provide sanitisation of payloads (if required)

Tool execution layer

The LLM would never be allowed to directly execute tools. It would go through a flow like; LLM proposes action > Orchestrator validates action > Policy engine approves/rejects > Tool layer executes

LLM reasoning

Here I would host several agents performing reasoning, these agents would hypothisize based on information received from initial recon, assess potential vulnerabilities, attack chains and compliance checking

Memory and reporting

These would be some form of structured objects that log everything, discovered endpoints, parameters, auth roles, tokens, cookies etc which could then be used in attack graphs to show potential chains of compromise

Logging and auditing

Here we would log all LLM prompts and outputs, proposed actions, rejected actions and why they were rejected, reasoning chains, scope validation. This would provide us with some form of legal defensibility, client reporting, internal QA and methodology validations

Methodology and compliance engine

We would need to embed methodologies into our testing, no slinging packets around here. We would look to align with professional workflows for example, web penetesting

Isolation and hardening layer

This would be where we deploy and harden, running in sandboxed VM's, network ingress/egress restrictions, internal vaults for storing creds, separate the LLM's from prod networks and immutable logs etc

This is of course my finger in the air on how I would construct some type of core model for this. It would need to be correctly thought about and scoped as a project and tested throughout the development. It's also a lot of work to potentially work myself out of a job, haha.

Conclusion

Is it possible to use AI for penetration testing? - yes, if it is used as a reasoning agent with strict controls and boundaries

Is it wise to use AI for penetration testing without constraints and hardening around the integration? - No

Would I design and use something like this with the way LLM's currently are? - Probably not!

In saying that, I'm hoping this post gives some brief insight into the dangers of using AI without constraints in penetration tests and I'm hoping this sparks some debates in the comments on my Twitter post. Let me know if you'd like to hear more from me on this and I'm also keen to hear other peoples thoughts on this subject!

‍