

The AI gold rush is currently in play and we're seeing everyone attempt to jump on the band wagon and create the next greatest thing, cyber security is not excluded from this and we are seeing a rise of "AI penetration testing and Red Teaming" tools pop up, but are we really ready to let AI wild with exploits in our production environments?
In this post, I'm going to talk about some of the things AI can do well, others it cannot, and why I believe we are not yet ready for full blown penetration testing tooling. I'll also add some insight into how I believe a penetration testing tool could be built if you're going to do so, and no, it doesn't remove humans entirely...
With our knowledge of LLM's as they currently are, there are reservations as to whether or not LLM's can be trusted to perform end-to-end penetration testing, however, that does not mean they are useless in our field as they can be used as reasoning assistants inside tightly controlled frameworks to assist us with penetration testing. Lets take a look at a couple of the core concepts we as penetration testers rely on:
In short, yes. LLM's are actually quite good at this and this is probably the strongest point in this post, however, there are some caveats that need to be considered....
There are some pros here if we wanted to build full blown tooling that could test end-to-end, but the cons far outweigh those. Respecting scope, making safe decisions so we don't bring down prod, and understanding impact is something we are relied upon to respect as penetration testers. Right now, LLM's cannot do this.
If you have an understanding of how LLM's work and how to test them (be it simple prompt injection, or full blown ecosystem testing) then some of the below points will resonate here, and provide further reasons as to why we should not let LLM's conduct full scale penetration tests yet without human intervention.
Say a response is ambiguous, they may invent theories or examples:
We've seen this many times, not only in our use of LLM enabled environments to assist in testing, but also in bug bounty submissions (which we are starting to see more AI slop submissions than ever before).
Because of these two issues, halucination and lack of any ground truth, generally an AI model is therefore not capable of:
Akin to the above, there are also further issues with LLM's use of over-suggestion, like the following examples:
Due to the above issues that we experience with LLM's, we cannot rely on prompt-based scoping or testing alone. The use of LLM's for triage or hypothesis generation could aid us in our testing, but end-to-end testing with these issues, I would not let anywhere near our production environment unmanaged.
No. I believe that used correctly we can utilise LLM's to speed up our testing process and assist as reasoning agents. Lets look at analysing web requests for example, there are some things that LLM's can assist with here:
In our testing of using LLM's as reasoning agents, we have found that they are quite good at pattern based recognition:
We have also used them for multi-reasoning and hypothesis generation, such as:
We have also used LLM's for other areas, such as report generation and within attack surface monitoring, but in none of this testing we have allowed direct exploit automation or unrestricted probing.
Yes and no. We certainly wouldn't place any trust currently in a fully autonomous penetration testing architecture, however if it was structured correctly, with decent guardrails and used as a reasoning co-pilot that understands methodology and scope, then it could be powerful. Here's how I would look to build something like this.
I'm going to touch on this subject lightly (otherwise we will be here for hours). If I were to build this there would be some core design principals that need to be in play:
For the workflow, it may look something like this:
User (Pentester)
↓
Orchestrator / Controller
↓
Policy & Scope Enforcement Layer
↓
Tool Execution Layer (Scanners, Browsers, APIs, Proxies)
↓
Target Systems (In-Scope Assets Only)
Cognitive Layer (In paralell):
LLM Reasoning Engine + Memory + Vulnerability AnalyzerThis is where we would define the engagement scope, align any methodologies, approve actions, override agent decisions and review logs
This would be some form of deterministic controller that maintains state, manages agent loops, stores session contexts, handles tool routing and enforces workflows
This layer is non-LLM, deterministic and tamper resistant, it would perform scope enforcement provided by the human control layer, apply restrictions to any LLM's in the workflow and provide sanitisation of payloads (if required)
The LLM would never be allowed to directly execute tools. It would go through a flow like; LLM proposes action > Orchestrator validates action > Policy engine approves/rejects > Tool layer executes
Here I would host several agents performing reasoning, these agents would hypothisize based on information received from initial recon, assess potential vulnerabilities, attack chains and compliance checking
These would be some form of structured objects that log everything, discovered endpoints, parameters, auth roles, tokens, cookies etc which could then be used in attack graphs to show potential chains of compromise
Here we would log all LLM prompts and outputs, proposed actions, rejected actions and why they were rejected, reasoning chains, scope validation. This would provide us with some form of legal defensibility, client reporting, internal QA and methodology validations
We would need to embed methodologies into our testing, no slinging packets around here. We would look to align with professional workflows for example, web penetesting
This would be where we deploy and harden, running in sandboxed VM's, network ingress/egress restrictions, internal vaults for storing creds, separate the LLM's from prod networks and immutable logs etc
This is of course my finger in the air on how I would construct some type of core model for this. It would need to be correctly thought about and scoped as a project and tested throughout the development. It's also a lot of work to potentially work myself out of a job, haha.
Is it possible to use AI for penetration testing? - yes, if it is used as a reasoning agent with strict controls and boundaries
Is it wise to use AI for penetration testing without constraints and hardening around the integration? - No
Would I design and use something like this with the way LLM's currently are? - Probably not!
In saying that, I'm hoping this post gives some brief insight into the dangers of using AI without constraints in penetration tests and I'm hoping this sparks some debates in the comments on my Twitter post. Let me know if you'd like to hear more from me on this and I'm also keen to hear other peoples thoughts on this subject!