The Setup
We finished building a Gemini image generation skill for Claude Code. It worked. You could type "generate a hero image for my blog post" and get back a polished PNG in seconds. The API calls were clean, the file handling was solid, the UX felt right.
So we shipped it. Just kidding — we attacked it first.
Why Red-Team Your Own Skills?
AI agent skills run with your permissions. They read files, write files, execute shell commands, and call external APIs. A skill with a vulnerability isn't just buggy — it's a door into your entire development environment.
The problem is that when you build something, you test the happy path. You check that the skill generates images, saves them correctly, and handles API errors. You don't think about what happens when someone passes a prompt containing shell metacharacters. You don't think about where the API key lives at rest. You don't think about what happens when the output path contains ../.
That's exactly what adversarial review is for.
The Two-Agent Approach
Instead of reviewing the skill ourselves, we dispatched two AI agents in parallel — each with a different adversarial mandate:
- Claude Opus — Focused on code-level security: injection vectors, secret handling, input validation, file system safety
- OpenAI Codex — Focused on architecture-level risks: trust boundaries, failure modes, dependency chain issues
Both agents received the full skill source code and a structured prompt asking them to find every way the skill could be exploited, misused, or made to fail dangerously.
They ran simultaneously. Within minutes, both came back with findings.
The Findings
Here's the actual results table from the session:
| # | Vulnerability | Severity | Found By | Description |
|---|-------------|----------|----------|-------------|
| 1 | Hardcoded API Key | Critical | Opus | The Gemini API key was stored as a string literal in the skill source rather than read from an environment variable. Anyone with access to the skill file gets the key. |
| 2 | Shell Injection via Prompt | Critical | Opus + Codex | User-provided prompt text was interpolated directly into a shell command string. A prompt like "test"; rm -rf / would execute arbitrary commands. |
| 3 | Path Injection in Batch Helper | High | Codex | The batch processing helper accepted output directory paths without sanitization. Paths containing ../ could write files anywhere on the filesystem. |
Both agents independently flagged the shell injection. Opus caught the API key issue first. Codex caught the path traversal. Neither alone found everything.
How We Fixed Each Issue
Fix 1: Environment Variable for API Key
Before:
const API_KEY = "AIzaSy..."After:
const API_KEY = process.env.GEMINI_API_KEY
if (!API_KEY) {
throw new Error("GEMINI_API_KEY environment variable is not set")
}Simple, but easy to forget when you're prototyping fast and copy-pasting from API docs.
Fix 2: Parameterized Command Execution
Before:
execSync(`curl -X POST "https://api.example.com/generate?prompt=${prompt}"`)After:
const response = await fetch(apiUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt })
})We replaced the shell-out entirely with a native HTTP call. No shell means no shell injection. When you must use shell commands, use execFile with argument arrays instead of execSync with string interpolation.
Fix 3: Path Sanitization
Before:
const outputPath = path.join(userProvidedDir, filename)After:
const resolved = path.resolve(userProvidedDir, filename)
const safeBase = path.resolve(process.cwd())
if (!resolved.startsWith(safeBase)) {
throw new Error("Output path must be within the project directory")
}
const outputPath = resolvedThe fix ensures every output path resolves to somewhere inside the project root. Any ../ traversal that escapes the project directory gets rejected.
What Surprised Us
Speed. The entire red-team cycle — dispatching both agents, reviewing findings, implementing fixes, and re-verifying — took under 15 minutes. A manual security review of the same code would take longer and likely miss the path traversal.
Complementary findings. Opus and Codex have different "instincts." Opus zeroed in on the code-level details (the hardcoded string, the string interpolation). Codex flagged the architectural issue (trust boundary between user input and filesystem). Running both gave better coverage than either alone.
The obvious stuff is the dangerous stuff. None of these vulnerabilities were exotic. Hardcoded secrets, shell injection, and path traversal are well-known patterns. But when you're deep in feature development, your brain optimizes for "does it work?" not "can it be exploited?" That's why automated adversarial review matters — it catches the things you know about but aren't thinking about.
The Red-Team Skill
We packaged this entire workflow into a reusable Claude Code skill called red-team. Point it at any skill or codebase and it:
1. Analyzes the code for 10 security categories (injection, secrets, file access, network calls, permissions, dependencies, error handling, data leakage, resource limits, trust boundaries)
2. Dispatches parallel adversarial reviews to multiple models
3. Merges findings into a prioritized report
4. Suggests specific fixes with code snippets
You can install it from the Prompte Skill Shed:
claude "install skill from prompte.app/skill-shed/red-team"The Takeaway
Building AI skills is fast. Shipping them safely requires a different mindset — one that assumes your code will receive hostile input. Multi-agent red-teaming automates that mindset.
Before you ship your next skill, throw two AIs at it. You'll be surprised what they find.
