AI Agent Attack Surfaces

AI agents introduce unique attack surfaces beyond traditional web applications. This document maps the primary threats and references industry research.

OWASP LLM Top 10

The OWASP Top 10 for LLM Applications identifies the most critical risks:

#RiskAgent Relevance
LLM01Prompt InjectionCritical - agents execute tools based on LLM output; injected instructions can trigger unintended actions
LLM02Insecure Output HandlingHigh - agent responses may be rendered in chat UIs or forwarded to other systems
LLM03Training Data PoisoningMedium - primarily a provider-side risk, but affects agent behavior
LLM04Model Denial of ServiceHigh - large inputs can exhaust context windows and provider budgets
LLM05Supply Chain VulnerabilitiesHigh - MCP servers, plugins, and skills are all supply chain vectors
LLM06Sensitive Information DisclosureCritical - agents access API keys, user data, and internal systems
LLM07Insecure Plugin DesignHigh - plugins with ambient authority can be exploited via prompt injection
LLM08Excessive AgencyCritical - agents with tool access can take real-world actions (shell commands, file writes, API calls)
LLM09OverrelianceMedium - users may trust agent output without verification
LLM10Model TheftLow - agents use hosted models via API

Prompt Injection

The most critical and least-solved attack surface for AI agents.

Direct injection: Attacker crafts input that overrides the system prompt, causing the agent to ignore instructions, reveal configuration, or execute unintended tools.

Indirect injection: Malicious content embedded in data the agent processes (web pages fetched by tools, documents read from disk, MCP server responses) that hijacks agent behavior.

Mitigations:

  • Pattern-based input filtering (14 patterns in OpenCrust)
  • System prompt hardening with explicit boundary instructions
  • Tool result sandboxing and size limits
  • Human-in-the-loop for destructive operations

References:

  • Greshake et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023)
  • Simon Willison, "Prompt injection explained" (2023)

Tool Abuse

Agents with tool access can be manipulated into:

  • Shell injection: Running arbitrary commands via bash tools
  • File exfiltration: Reading sensitive files and including content in responses
  • SSRF: Fetching internal URLs via web fetch tools
  • Recursive scheduling: Creating infinite task loops via scheduling tools

Mitigations:

  • Tool iteration limits (max 10 round-trips)
  • Recursive scheduling prevention (heartbeat context blocks re-scheduling)
  • File path validation and sandboxing
  • Private IP blocking for outbound requests

Credential Leakage

AI agents are high-value targets because they aggregate credentials:

  • LLM provider API keys (Anthropic, OpenAI)
  • Channel bot tokens (Discord, Telegram, Slack)
  • Webhook secrets (WhatsApp)
  • User pairing codes

Attack vectors:

  • Plaintext config files accessible to local users or leaked in backups
  • Log output containing API keys (accidental logging)
  • Prompt injection exfiltrating credentials via tool responses
  • Memory/history containing credentials from prior conversations

Mitigations:

  • Encrypted vault (AES-256-GCM) for all credentials
  • Automatic log redaction for known key patterns
  • Input sanitization preventing credential injection
  • Session history bounded and cleaned up

Supply Chain

MCP Servers

MCP servers are external processes that the agent trusts to provide tools. A compromised MCP server can:

  • Return malicious tool schemas that trick the LLM into dangerous actions
  • Exfiltrate data passed as tool arguments
  • Exploit the agent host via process-level access (stdio transport)

Plugins (WASM)

Despite WASM sandboxing, plugins still present risks:

  • Excessive host function imports granting unintended capabilities
  • Resource exhaustion (memory, CPU)
  • Side-channel attacks (timing)

Skills

Skills are Markdown files injected into the system prompt. A malicious skill can:

  • Override agent behavior via prompt injection in the skill body
  • Introduce tool-use patterns that exfiltrate data

Industry Research

  • Belgium CCB (2024): Guidelines on securing AI systems, emphasizing input validation and output filtering for LLM-integrated applications.
  • Dutch DPA (2024): Guidance on AI and GDPR, covering data minimization requirements relevant to agent memory and logging.
  • Sophos (2024): "The lethal trifecta" - compromised credentials, tool abuse, and prompt injection as the three converging attack vectors against AI agents.
  • SecurityScorecard (2025): Supply chain risk analysis showing third-party integrations (MCP servers, plugins) as the fastest-growing attack surface for AI applications.