Prompt to Pivot: Penetration Testing AI Ecosystems

Artificial intelligence systems are no longer standalone models responding to prompts. Modern deployments increasingly resemble complex ecosystems composed of models, retrieval pipelines, orchestration layers, external tools, and data infrastructure. These architectures expand the attack surface far beyond traditional software systems.

For security practitioners, this shift introduces an uncomfortable reality: the logic of the system is partially probabilistic and controlled through language rather than code. Attackers can influence behavior not only through API parameters or malformed input but through adversarial instructions, poisoned data, and manipulated context.

In this post, I'm going to discuss how to approach penetration testing for AI ecosystems, breaking down the architecture, identifying attack surfaces, and outlining practical testing strategies for modern AI applications.

AI Red Teaming vs Penetration testing

LLM's are not new in this world, infact, the first documented LLM was at MIT in 1964, a chat bot named ELIZA which was created to explore communication between humans and machines. LLM's have been used in institutions and the military for years, and in 2020 we started to see the first interfaced chat bots that we are seeing a rise in now.

Testing of LLM's has typically only been around a few years that we know of, known as "red teaming" the model, this consisted of prompt injection and jailbreaking techniques, but this has evolved with agentic AI becoming more frequent and introducing the need for penetration testing methodologies as ecosystems become more intertwined with our traditional IT stack.

What is an AI Ecosystem?

We've all seen AI based chat bots, they've been around for some time now, but as AI gets introduced and further integrated into corporate infrastructure, they evolve from being a simple chat interface, to a whole ecosystem.

An AI ecosystem refers to the broader environment that surrounds and enables a model to operate in production. While a language model may be the core component, its capabilities are typically extended through multiple surrounding systems, examples being:

Application orchestration frameworks
Vector databases used for retrieval pipelines
External tool integrations and APIs
Prompt templates and system instructions
Agentic frameworks
Retrieval-augmented generation (RAG) pipelines
Cache-augmented generation (CAG)
Guardrail and moderation layers
Deployment and CI/CD infrastructure

Together these components form a distributed stack, where the model is only one element in a larger ecosystem of other elements and tooling. Unlike traditional application stacks, AI ecosystems introduce several unique properties:

Non-deterministic behavior — identical inputs may produce different outputs
Prompt-based control planes — natural language instructions can change system behavior
Opaque reasoning processes — internal model logic cannot easily be inspected
Data-dependent execution paths — retrieved information influences runtime decisions

These characteristics create entirely new security challenges that traditional penetration testing methodologies were not designed to address. An example of AI ecosystem, would be Microsofts Copilot, as shown below:

Infographic of Microsoft Copilot Ecosystem

Threat Modeling AI Ecosystems

The first step in testing an AI ecosystem is somewhat similar to traditional testing, we need to perform threat modelling (scoping) of the implementation to understand as much as we can about the system(s). This becomes key when assessing AI-enabled technology, as there is a lot of ambiguity with these systems, that when combined with the first-try-fallacy issues of LLM's, could make the difference between a test being a couple of weeks long to a couple of months without proper understanding of the architecture and its intricacies.

When threat modelling an AI-driven system we are looking for what the inputs are, intuit what technologies are involved and what the risks are to the implementation. There are two separate paths I like to take here, the first being to perform a more generic scope, where we would assess things like:

General information - Purpose of system, responsibility, primary use case, deployment status (QA, prod etc)
Model information - What model is in use, what system/developer prompt is used, does it use a RAG system, is it agentic etc
Data security - Where is data stored, are there any databases in use, data encryption
Interfacing - How do you interact with the AI, is there any prompt sanitisation, rate limiting, authentication
Network architecture - Is the system internet facing or internal, integrations with external applications, use of open-source software etc
Safety - Is bias a concern for the organisation, are there potential legal/financial impact if compromised, what safeguards are in place

This is just a brief list that would give me an initial view of how the AI is implemented and it's surrounding integrations. The next step here would be to threat model the AI-driven system, where I would be looking at the following areas:

What system inputs and entry points are available (ie. how do we interact with the LLM)
Are there any potential ecosystem vulnerabilities (third-party components, libraries, networks)
What does the model security look like, proprietary/open-source/third-party LLM, Is the model susceptible to adversarial attacks or jailbreak techniques, Are there protections against model inference manipulations
Are there any preventions in place to prevent prompt engineering the model
Can the model access any sensitive data?
What does the application layer look like, do integrations have authentication, rate limitiing, logging etc
Is there a potential to pivot through the model to laterally move inside the organisation
Is there any monitoring and response in place, security events, logging etc

These are some high level examples of things we look at when threat modelling AI-driven systems to gain a better idea of how we can begin to think about attacking the model, its trust boundaries and any integrated tooling.

Testing Methodologies

As with any penetration testing, we need a structured framework to guide us through a systematic process to identify, exploit and report vulnerabilities. These methodologies ensure consistency, thoroughness, and compliance with industry standards.

There are plenty of frameworks out there for penetration testing, such as PTES, OWASP testing guide, NIST SP 800-115, OSSTMM etc, but none of these cover testing AI ecosystems. More recently OWASP have released v1 of the OWASP AI testing guide:

https://owasp.org/www-project-ai-testing-guide/

There is also the Arcanum LLM assessment methodology (shout out to Jason Haddix and his Attacking AI course), which is typically what I follow when testing:

This is still a new space in the industry and so far there are only a couple of methodologies out there we can follow. As AI penetesting becomes more popular, I expect to see further methodologies come to fruition, digging deeper into the areas of AI penetesting.

The Assessment

Now that we have threat modelled and have some methodologies we can use, it's on to assessing the implementation. I'll keep this section fairly brief as there are a whole host of attacks throughout these models and i'll dig into those in future posts.

Identifying the inputs:

In most current cases when testing LLM integrations, we will see a web application enabled by LLM. As this is the case, we might look for some of the following inputs:

API integrations
Web form integrations
File uploads
Chat windows

There are a whole host of other inputs we might see, such as indirect inputs (think Copilot and it's integrations into email, sharepoint, teams, graph etc). Some of these inputs will use traditional web methodologies of testing, while others will be via a form of direct or indirect prompt injections.

Attacking the ecosystem:

AI ecosystems introduce further attack surface outside of the typical chat windows and LLM integrations. When we look at attacking the ecosystem we are looking at the surrounding infrastructure that is present with hosting a LLM based service. This might take form as one (or more) of the following:

Traditional web application integrations (via API's)
Tools that interact with internal development implementations (CI/CD pipelines for example)
MCP servers
RAG and CAG databases

There are several ways in which we can attack the backend infrastructure, for example, prompt injection that leads to blind XSS, path traversals, auth bypasses, SSRF etc. These web vulnerabilities in systems that surround the LLM can be used to pivot through the LLM and into an organisation.

Attacking the model:

When we talk about attacking the model, we turn back to the AI red teaming phase. This is where we typically attempt prompt injections and/or jailbreaks to get the model to illicit bias or harm (for example, getting it to provide tutorials on how to do naughty things, like making bombs). Frontier models such as Claude are getting harder to prompt inject or jailbreak due to their continuously updated protections, but custom models are more prone to these types of attacks.

When attacking the model we also need to understand what guardrails are in place, be it at the prompt level of the LLM, or some form of security around the model. There are a whole bunch of evasion techniques that can be used to get around these, maybe we can explore those in a future blog post!

Attacking the prompt engineering:

In this phase we are looking to leak any system or developer prompts from the model. These prompts contain valuable information that provides us with an understanding of how the model works. By performing prompt injection to get the system or developer prompts, we can expose any security implementations, sensitive internal data like URL's or API keys, API endpoints etc. If you're interested to learn more about how we might extract this information and bypass any security implementations, Jason Haddix has created the Arcanum PI Taxonomy, which has a whole host of valuable information used for prompt injecting LLM's:

https://arcanum-sec.github.io/arc_pi_taxonomy/

Attacking the data:

This is an important part of the assessment as it's where we start to go after sensitive company data. This could be customer data, employee information, intellectual property, technical specs etc. We are slowly seeing implementations of LLM's integrated with data stores and data retrieval systems. This could take the form of a database, S3 buckets, RAG vector stores and API endpoints and we have a number of opportunities:

Returning sensitive data
- Typically here we would look to extract sensitive data via the LLM. This might be possible due to poor permissions, the model being trained on sensitive data, API access or enrichment from a database etc
Writing data into an application in the backend
- If we come across an API that can write into a backend service, we can prompt inject into that service (think confluence, salesforce etc)
Poisoning the data / ecosystem
- If the model is self-training, or user based submissions are used for training the model, we might attempt to poison thedata set

Attacking the application:

This part is where we start to see traditional web vulnerabilities come into play. AI Ecosystems are more frequently using applications as part of their stack now and as such this opens up the attack surface to traditional attacks, such as:

Server side request forgery
Cross-site scripting and Blind Cross-site scripting
Remote code execution
File upload vulns
Insecure direct object references

To name a few. If we can find and exploit these vulnerabilities, it leads nicely into the final part of the methodology, which is to pivot through the model and into the organisation.

Pivoting:

Once you can influence the model’s output, you can often influence the systems it has access to, internal APIs, plugins, retrieval systems, SaaS tools, and automation agents. This turns prompt injection from a “weird AI bug” into a lateral movement vector.

In mature environments, the LLM becomes a soft entry point into hard systems, data stores, ticketing systems, email, CRM, code repositories, and internal knowledge bases. The real impact of an AI compromise is therefore not misinformation, it’s unauthorised actions and lateral movement deeper into the organisation. That’s the pivot that organisations are currently underestimating, and it’s why AI testing must be treated as application and infrastructure security, not just model safety.

Conclusion

AI systems are not just models, they are complex ecosystems made up of models, prompts, data pipelines, APIs, plugins, agents, and internal integrations. When organisations deploy an LLM, they are not just deploying a chatbot, they are deploying a new interface to their internal systems and data.

That interface can be manipulated, influenced, and abused in ways traditional applications cannot. If we do not test these systems thoroughly, we risk creating highly trusted, highly connected systems that can be socially engineered through text alone. Penetration testing AI ecosystems is therefore not a niche exercise, it is a necessary evolution of application security. The organisations that understand this early will prevent the next generation of breaches, while those that treat AI as just another feature will end up exposing their internal systems through the very technology they adopted to move faster.

I hope this has given some insight into the ways in which we can test these fast moving AI integrations. More to come soon!