AI Agent Attack Surfaces

AI agents introduce unique attack surfaces beyond traditional web applications. This document maps the primary threats and references industry research.

OWASP LLM Top 10

The OWASP Top 10 for LLM Applications identifies the most critical risks:

#	Risk	Agent Relevance
LLM01	Prompt Injection	Critical - agents execute tools based on LLM output; injected instructions can trigger unintended actions
LLM02	Insecure Output Handling	High - agent responses may be rendered in chat UIs or forwarded to other systems
LLM03	Training Data Poisoning	Medium - primarily a provider-side risk, but affects agent behavior
LLM04	Model Denial of Service	High - large inputs can exhaust context windows and provider budgets
LLM05	Supply Chain Vulnerabilities	High - MCP servers, plugins, and skills are all supply chain vectors
LLM06	Sensitive Information Disclosure	Critical - agents access API keys, user data, and internal systems
LLM07	Insecure Plugin Design	High - plugins with ambient authority can be exploited via prompt injection
LLM08	Excessive Agency	Critical - agents with tool access can take real-world actions (shell commands, file writes, API calls)
LLM09	Overreliance	Medium - users may trust agent output without verification
LLM10	Model Theft	Low - agents use hosted models via API

Prompt Injection

The most critical and least-solved attack surface for AI agents.

Direct injection: Attacker crafts input that overrides the system prompt, causing the agent to ignore instructions, reveal configuration, or execute unintended tools.

Indirect injection: Malicious content embedded in data the agent processes (web pages fetched by tools, documents read from disk, MCP server responses) that hijacks agent behavior.

Mitigations:

Pattern-based input filtering (14 patterns in OpenCrust)
System prompt hardening with explicit boundary instructions
Tool result sandboxing and size limits
Human-in-the-loop for destructive operations

References:

Greshake et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023)
Simon Willison, "Prompt injection explained" (2023)

Tool Abuse

Agents with tool access can be manipulated into:

Shell injection: Running arbitrary commands via bash tools
File exfiltration: Reading sensitive files and including content in responses
SSRF: Fetching internal URLs via web fetch tools
Recursive scheduling: Creating infinite task loops via scheduling tools

Mitigations:

Tool iteration limits (max 10 round-trips)
Recursive scheduling prevention (heartbeat context blocks re-scheduling)
File path validation and sandboxing
Private IP blocking for outbound requests

Credential Leakage

AI agents are high-value targets because they aggregate credentials:

LLM provider API keys (Anthropic, OpenAI)
Channel bot tokens (Discord, Telegram, Slack)
Webhook secrets (WhatsApp)
User pairing codes

Attack vectors:

Plaintext config files accessible to local users or leaked in backups
Log output containing API keys (accidental logging)
Prompt injection exfiltrating credentials via tool responses
Memory/history containing credentials from prior conversations

Mitigations:

Encrypted vault (AES-256-GCM) for all credentials
Automatic log redaction for known key patterns
Input sanitization preventing credential injection
Session history bounded and cleaned up

Supply Chain

MCP Servers

MCP servers are external processes that the agent trusts to provide tools. A compromised MCP server can:

Return malicious tool schemas that trick the LLM into dangerous actions
Exfiltrate data passed as tool arguments
Exploit the agent host via process-level access (stdio transport)

Plugins (WASM)

Despite WASM sandboxing, plugins still present risks:

Excessive host function imports granting unintended capabilities
Resource exhaustion (memory, CPU)
Side-channel attacks (timing)

Skills

Skills are Markdown files injected into the system prompt. A malicious skill can:

Override agent behavior via prompt injection in the skill body
Introduce tool-use patterns that exfiltrate data

Industry Research

Belgium CCB (2024): Guidelines on securing AI systems, emphasizing input validation and output filtering for LLM-integrated applications.
Dutch DPA (2024): Guidance on AI and GDPR, covering data minimization requirements relevant to agent memory and logging.
Sophos (2024): "The lethal trifecta" - compromised credentials, tool abuse, and prompt injection as the three converging attack vectors against AI agents.
SecurityScorecard (2025): Supply chain risk analysis showing third-party integrations (MCP servers, plugins) as the fastest-growing attack surface for AI applications.

OpenCrust Documentation