---
name: red-team
description: >
Use when a design spec, architecture document, implementation plan, or configuration
needs adversarial review before implementation or deployment. Triggers on: "red-team
this", "review for failure modes", "what could go wrong", "stress test this design",
"find holes in this", "devil's advocate", "challenge this", or when completing a
brainstorming/design phase and needing validation before proceeding. Also use
proactively before any major implementation begins.
---
# Red-Team Review
Dual-model adversarial review using Opus 4.6 and Codex GPT-5.3 in parallel. Two different model families attack the same artifact from different angles, then findings are merged into a single prioritized report.
## Why Two Models
Single-model review has blind spots. Opus excels at strategic/architectural reasoning, security implications, and system-level failure modes. Codex excels at technical feasibility, protocol correctness, cost math, race conditions, and integration gaps. Running both catches issues neither would find alone.
## Process
1. Identify artifact to review
2. Gather context files
3. Preflight check
4. Dispatch Opus + Codex in parallel
5. Collect both results (handle partial failures)
6. Merge + deduplicate findings
7. Present combined report
8. Fix critical issues if any, update artifact
## Step 1 — Identify the Artifact
What are you red-teaming? The artifact determines the review angles.
| Artifact Type | Opus Focus | Codex Focus |
|--------------|------------|-------------|
| Design spec | Failure modes, contradictions, security blast radius, missing details, bootstrap problems | Protocol feasibility, cost math, config schema, race conditions, cold start, integration gaps |
| Implementation plan | Sequencing errors, dependency gaps, risk underestimation, scope creep | Build feasibility, toolchain issues, missing steps, environment assumptions |
| Architecture doc | System-level failures, scalability limits, operational gaps, who watches the watcher | API compatibility, data flow correctness, latency math, resource contention |
| Configuration | Security exposure, drift risks, missing validation | Schema correctness, default values, cross-config consistency |
For code PRs, use a dedicated code-review skill instead of this one.
## Step 2 — Gather Context
The artifact alone is not enough. Both reviewers need surrounding context to find real issues (not theoretical ones). Identify 2-5 context files:
- Current system architecture (CLAUDE.md, README, etc.)
- Related configs or schemas
- Existing protocols the artifact must comply with
- Prior art or decisions that constrain the design
**Security:** Before dispatching, redact any secrets, API keys, or credentials from content you will inline into Codex prompts. Codex worker prompts are sent to OpenAI's API.
## Step 3 — Preflight Check
Before dispatching, verify both lanes are available:
- **Opus lane:** Confirm the Agent tool is available and \`model: opus\` is supported in your session
- **Codex lane:** Verify \`codex\` binary exists (\`which codex\`) and auth is valid (\`codex --version\`)
- **If Codex is unavailable:** Fall back to Opus-only review. Note the gap in the report.
- **If Opus is unavailable:** Fall back to Codex-only review. Note the gap in the report.
- **If both unavailable:** Use a single \`general-purpose\` Agent subagent as the sole reviewer.
## Step 4 — Dispatch Both Reviewers in Parallel
### Opus Reviewer (Agent tool)
Dispatch as a background \`general-purpose\` Agent subagent with \`model: opus\`:
\`\`\`
Agent tool:
subagent_type: general-purpose
model: opus
run_in_background: true
prompt: [OPUS REVIEW PROMPT — see template below]
\`\`\`
**Opus subagents can read files directly — provide file paths, not inlined content.**
### Codex Reviewer (Codex Workers)
Dispatch via Bash using codex exec. **Prefer inlining critical context for reliability.**
For small-to-medium artifacts (<50KB), inline directly:
\`\`\`bash
SESSION_ID=$(date +%s)
codex exec --full-auto --ephemeral --skip-git-repo-check \\
-m gpt-5.3-codex \\
-c model_reasoning_effort=xhigh \\
-o "/tmp/codex_redteam_${SESSION_ID}.txt" \\
"RED-TEAM REVIEW PROMPT WITH ARTIFACT CONTENTS INLINED" &
wait
\`\`\`
**Both dispatches happen in one message** — a single response containing one Agent tool call (Opus, background) and one Bash tool call (Codex worker). They run in parallel.
## Review Prompt Templates
### Opus Prompt Template
\`\`\`
RED-TEAM REVIEW of [ARTIFACT NAME].
You are an adversarial reviewer. Your job is to find everything that could go wrong,
break, contradict, or be underspecified BEFORE implementation begins.
The artifact is at: [FILE PATH]
Context files to read: [LIST PATHS]
Review for:
1. Failure modes — What breaks? Cascading failures? Single points of failure?
2. Contradictions — Does anything conflict with existing systems or its own claims?
3. Missing details — What will an implementer hit that the spec doesn't cover?
4. Security/blast radius — What's the worst case if something goes wrong?
5. Bootstrap/cold start — Does day 1 actually work?
6. Operational gaps — Monitoring, recovery, degradation paths?
7. Cost/budget — Are estimates realistic?
8. Compliance with existing protocols — Does it actually fit the current system?
Rules:
- Every finding must reference a specific file, section, or line
- Do not invent files, APIs, or scripts that may not exist — verify by reading
- Confidence-tag uncertain findings as [UNVERIFIED]
Format:
## Critical Issues (must fix before implementation)
## Warnings (should address, judgment call)
## Observations (future consideration)
\`\`\`
### Codex Prompt Template
\`\`\`
RED-TEAM REVIEW of [ARTIFACT NAME].
You are a technical feasibility reviewer. Your job is to find protocol gaps,
math errors, race conditions, schema mismatches, and integration failures.
Here is the full artifact:
[INLINE COMPLETE FILE CONTENTS]
Review for:
1. Protocol gaps — Do message formats, handoff patterns, and APIs actually exist?
2. Config/schema correctness — Do field names, value types, and structures match?
3. Cost math — Are budget estimates calculated correctly?
4. Scaling/growth — Do files, logs, or state grow unbounded?
5. Race conditions — Can concurrent processes corrupt shared state?
6. Integration — Can the proposed tools/scripts actually be invoked as described?
7. Cold start — What happens on first run with no prior state?
8. Missing tooling — What scripts/tools are assumed but don't exist?
Format:
## Critical Issues
## Warnings
## Observations
\`\`\`
## Step 5 — Collect and Merge
### Error Handling
If one reviewer fails (timeout, crash, empty output, auth error):
1. Retry the failed reviewer once
2. If it fails again, proceed with the successful reviewer's findings
3. Note in the report header: "Single-model review — [reason]"
### Merge Rules
1. **Read both results** — Opus from the Agent result, Codex from the output file
2. **Classify each finding** into Critical / Warning / Observation (take the higher severity if reviewers disagree)
3. **Merge overlapping findings** — match on component + failure mode. Combine into one entry, note "Both" as source
4. **Preserve unique findings** — tag each with its source (Opus/Codex)
5. **Count totals** — N critical, N warnings, N observations
### Combined Report Format
\`\`\`markdown
## Combined Red-Team Report: [ARTIFACT NAME]
### Critical Issues (N — must fix)
| # | Issue | Opus | Codex | Fix |
|---|-------|:----:|:-----:|-----|
| C1 | [Description] | X | X | [Specific fix] |
### Warnings (N — should address)
| # | Issue | Opus | Codex | Fix |
|---|-------|:----:|:-----:|-----|
| W1 | [Description] | X | X | [Specific fix] |
### Observations (N — future consideration)
| # | Issue | Source |
|---|-------|--------|
| O1 | [Description] | Both/Opus/Codex |
\`\`\`
## Step 6 — Clean Up
\`\`\`bash
rm -f "/tmp/codex_redteam_${SESSION_ID}.txt"
\`\`\`
## When NOT to Use
- **Code review** — use a dedicated code-review skill instead
- **Quick sanity check** — use gemini-advisor or kimi-advisor for a second opinion
- **Trivial artifact** — a 10-line config change doesn't need dual-model adversarial review
- **Work in progress** — red-team complete artifacts, not drafts