I Red-Teamed My Own AI Skill — Here's What Two AIs Found

The Setup

We finished building a Gemini image generation skill for Claude Code. It worked. You could type "generate a hero image for my blog post" and get back a polished PNG in seconds. The API calls were clean, the file handling was solid, the UX felt right.

So we shipped it. Just kidding — we attacked it first.

Why Red-Team Your Own Skills?

AI agent skills run with your permissions. They read files, write files, execute shell commands, and call external APIs. A skill with a vulnerability isn't just buggy — it's a door into your entire development environment.

The problem is that when you build something, you test the happy path. You check that the skill generates images, saves them correctly, and handles API errors. You don't think about what happens when someone passes a prompt containing shell metacharacters. You don't think about where the API key lives at rest. You don't think about what happens when the output path contains ../.

That's exactly what adversarial review is for.

The Two-Agent Approach

Instead of reviewing the skill ourselves, we dispatched two AI agents in parallel — each with a different adversarial mandate:

Claude Opus — Focused on code-level security: injection vectors, secret handling, input validation, file system safety
OpenAI Codex — Focused on architecture-level risks: trust boundaries, failure modes, dependency chain issues

Both agents received the full skill source code and a structured prompt asking them to find every way the skill could be exploited, misused, or made to fail dangerously.

They ran simultaneously. Within minutes, both came back with findings.

The Findings

Here's the actual results table from the session:

|---|-------------|----------|----------|-------------|

Both agents independently flagged the shell injection. Opus caught the API key issue first. Codex caught the path traversal. Neither alone found everything.

How We Fixed Each Issue

Fix 1: Environment Variable for API Key

Before:

const API_KEY = "AIzaSy..."

After:

const API_KEY = process.env.GEMINI_API_KEY
if (!API_KEY) {
  throw new Error("GEMINI_API_KEY environment variable is not set")
}

Simple, but easy to forget when you're prototyping fast and copy-pasting from API docs.

Fix 2: Parameterized Command Execution

Before:

execSync(`curl -X POST "https://api.example.com/generate?prompt=${prompt}"`)

After:

const response = await fetch(apiUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ prompt })
})

We replaced the shell-out entirely with a native HTTP call. No shell means no shell injection. When you must use shell commands, use execFile with argument arrays instead of execSync with string interpolation.

Fix 3: Path Sanitization

Before:

const outputPath = path.join(userProvidedDir, filename)

After:

const resolved = path.resolve(userProvidedDir, filename)
const safeBase = path.resolve(process.cwd())
if (!resolved.startsWith(safeBase)) {
  throw new Error("Output path must be within the project directory")
}
const outputPath = resolved

The fix ensures every output path resolves to somewhere inside the project root. Any ../ traversal that escapes the project directory gets rejected.

What Surprised Us

Speed. The entire red-team cycle — dispatching both agents, reviewing findings, implementing fixes, and re-verifying — took under 15 minutes. A manual security review of the same code would take longer and likely miss the path traversal.

Complementary findings. Opus and Codex have different "instincts." Opus zeroed in on the code-level details (the hardcoded string, the string interpolation). Codex flagged the architectural issue (trust boundary between user input and filesystem). Running both gave better coverage than either alone.

The obvious stuff is the dangerous stuff. None of these vulnerabilities were exotic. Hardcoded secrets, shell injection, and path traversal are well-known patterns. But when you're deep in feature development, your brain optimizes for "does it work?" not "can it be exploited?" That's why automated adversarial review matters — it catches the things you know about but aren't thinking about.

The Red-Team Skill

We packaged this entire workflow into a reusable Claude Code skill called red-team. Point it at any skill or codebase and it:

1. Analyzes the code for 10 security categories (injection, secrets, file access, network calls, permissions, dependencies, error handling, data leakage, resource limits, trust boundaries)

2. Dispatches parallel adversarial reviews to multiple models

3. Merges findings into a prioritized report

4. Suggests specific fixes with code snippets

You can install it from the Prompte Skill Shed:

claude "install skill from prompte.app/skill-shed/red-team"

The Takeaway

Building AI skills is fast. Shipping them safely requires a different mindset — one that assumes your code will receive hostile input. Multi-agent red-teaming automates that mindset.

Before you ship your next skill, throw two AIs at it. You'll be surprised what they find.

The Setup

Why Red-Team Your Own Skills?

The Two-Agent Approach

The Findings

How We Fixed Each Issue

Fix 1: Environment Variable for API Key

Fix 2: Parameterized Command Execution

Fix 3: Path Sanitization

What Surprised Us

The Red-Team Skill

The Takeaway

Build better prompts with Prompte