Get Started
Back to Skill Shed
Architecture

Red-Team Review

Prompte19 March 2026AdvancedClaude Code
red-teamadversarial-reviewdual-modelarchitecturedesign-reviewfailure-modessecuritymulti-agent

What This Skill Does

Runs two different AI model families against the same artifact in parallel — Opus 4.6 (strategic/architectural reasoning, security, system-level failure modes) and Codex GPT-5.3 (technical feasibility, protocol correctness, cost math, race conditions, integration gaps). Findings are merged, deduplicated, and prioritized into a single adversarial report. Two models catch issues neither would find alone.

When to Use It

  • Before implementing a design spec, architecture doc, or configuration change
  • When you need adversarial review: "red-team this", "what could go wrong", "find holes in this"
  • After completing a brainstorming/design phase and before proceeding to implementation
  • For stress-testing deployment plans, migration strategies, or infrastructure changes
  • Say "red-team this", "devil's advocate", "challenge this", or use /red-team

How It Works

1. Identify the artifact to review (spec, plan, architecture doc, config)

2. Gather 2-5 context files for surrounding system knowledge

3. Preflight check — verify both Opus and Codex are available

4. Dispatch both reviewers in parallel (Agent tool for Opus, Bash/codex exec for Codex)

5. Collect results, handle partial failures (retry once, then proceed with one model)

6. Merge + deduplicate findings, take higher severity on disagreements

7. Present combined report with Critical / Warnings / Observations

8. Fix critical issues if any

Review Angles

| Artifact Type | Opus Focus | Codex Focus |

|--------------|------------|-------------|

| Design spec | Failure modes, contradictions, security blast radius, bootstrap problems | Protocol feasibility, cost math, race conditions, integration gaps |

| Implementation plan | Sequencing errors, dependency gaps, risk underestimation | Build feasibility, toolchain issues, missing steps |

| Architecture doc | System-level failures, scalability limits, operational gaps | API compatibility, data flow correctness, latency math |

| Configuration | Security exposure, drift risks, missing validation | Schema correctness, default values, cross-config consistency |

Requirements

  • Claude Code (Anthropic CLI) with Opus model access
  • Codex CLI (npm install -g @openai/codex)
  • OpenAI API key (via codex login or environment variable)
  • macOS or Linux

Source

Open source under the Unlicense — [github.com/Pricing-Logic/claude-red-team-skill](https://github.com/Pricing-Logic/claude-red-team-skill)

Skill File

red-team-review.skill.md
---
name: red-team
description: >
  Use when a design spec, architecture document, implementation plan, or configuration
  needs adversarial review before implementation or deployment. Triggers on: "red-team
  this", "review for failure modes", "what could go wrong", "stress test this design",
  "find holes in this", "devil's advocate", "challenge this", or when completing a
  brainstorming/design phase and needing validation before proceeding. Also use
  proactively before any major implementation begins.
---

# Red-Team Review

Dual-model adversarial review using Opus 4.6 and Codex GPT-5.3 in parallel. Two different model families attack the same artifact from different angles, then findings are merged into a single prioritized report.

## Why Two Models

Single-model review has blind spots. Opus excels at strategic/architectural reasoning, security implications, and system-level failure modes. Codex excels at technical feasibility, protocol correctness, cost math, race conditions, and integration gaps. Running both catches issues neither would find alone.

## Process

1. Identify artifact to review
2. Gather context files
3. Preflight check
4. Dispatch Opus + Codex in parallel
5. Collect both results (handle partial failures)
6. Merge + deduplicate findings
7. Present combined report
8. Fix critical issues if any, update artifact

## Step 1 — Identify the Artifact

What are you red-teaming? The artifact determines the review angles.

| Artifact Type | Opus Focus | Codex Focus |
|--------------|------------|-------------|
| Design spec | Failure modes, contradictions, security blast radius, missing details, bootstrap problems | Protocol feasibility, cost math, config schema, race conditions, cold start, integration gaps |
| Implementation plan | Sequencing errors, dependency gaps, risk underestimation, scope creep | Build feasibility, toolchain issues, missing steps, environment assumptions |
| Architecture doc | System-level failures, scalability limits, operational gaps, who watches the watcher | API compatibility, data flow correctness, latency math, resource contention |
| Configuration | Security exposure, drift risks, missing validation | Schema correctness, default values, cross-config consistency |

For code PRs, use a dedicated code-review skill instead of this one.

## Step 2 — Gather Context

The artifact alone is not enough. Both reviewers need surrounding context to find real issues (not theoretical ones). Identify 2-5 context files:

- Current system architecture (CLAUDE.md, README, etc.)
- Related configs or schemas
- Existing protocols the artifact must comply with
- Prior art or decisions that constrain the design

**Security:** Before dispatching, redact any secrets, API keys, or credentials from content you will inline into Codex prompts. Codex worker prompts are sent to OpenAI's API.

## Step 3 — Preflight Check

Before dispatching, verify both lanes are available:

- **Opus lane:** Confirm the Agent tool is available and \`model: opus\` is supported in your session
- **Codex lane:** Verify \`codex\` binary exists (\`which codex\`) and auth is valid (\`codex --version\`)
- **If Codex is unavailable:** Fall back to Opus-only review. Note the gap in the report.
- **If Opus is unavailable:** Fall back to Codex-only review. Note the gap in the report.
- **If both unavailable:** Use a single \`general-purpose\` Agent subagent as the sole reviewer.

## Step 4 — Dispatch Both Reviewers in Parallel

### Opus Reviewer (Agent tool)

Dispatch as a background \`general-purpose\` Agent subagent with \`model: opus\`:

\`\`\`
Agent tool:
  subagent_type: general-purpose
  model: opus
  run_in_background: true
  prompt: [OPUS REVIEW PROMPT — see template below]
\`\`\`

**Opus subagents can read files directly — provide file paths, not inlined content.**

### Codex Reviewer (Codex Workers)

Dispatch via Bash using codex exec. **Prefer inlining critical context for reliability.**

For small-to-medium artifacts (<50KB), inline directly:

\`\`\`bash
SESSION_ID=$(date +%s)

codex exec --full-auto --ephemeral --skip-git-repo-check \\
  -m gpt-5.3-codex \\
  -c model_reasoning_effort=xhigh \\
  -o "/tmp/codex_redteam_${SESSION_ID}.txt" \\
  "RED-TEAM REVIEW PROMPT WITH ARTIFACT CONTENTS INLINED" &

wait
\`\`\`

**Both dispatches happen in one message** — a single response containing one Agent tool call (Opus, background) and one Bash tool call (Codex worker). They run in parallel.

## Review Prompt Templates

### Opus Prompt Template

\`\`\`
RED-TEAM REVIEW of [ARTIFACT NAME].

You are an adversarial reviewer. Your job is to find everything that could go wrong,
break, contradict, or be underspecified BEFORE implementation begins.

The artifact is at: [FILE PATH]
Context files to read: [LIST PATHS]

Review for:
1. Failure modes — What breaks? Cascading failures? Single points of failure?
2. Contradictions — Does anything conflict with existing systems or its own claims?
3. Missing details — What will an implementer hit that the spec doesn't cover?
4. Security/blast radius — What's the worst case if something goes wrong?
5. Bootstrap/cold start — Does day 1 actually work?
6. Operational gaps — Monitoring, recovery, degradation paths?
7. Cost/budget — Are estimates realistic?
8. Compliance with existing protocols — Does it actually fit the current system?

Rules:
- Every finding must reference a specific file, section, or line
- Do not invent files, APIs, or scripts that may not exist — verify by reading
- Confidence-tag uncertain findings as [UNVERIFIED]

Format:
## Critical Issues (must fix before implementation)
## Warnings (should address, judgment call)
## Observations (future consideration)
\`\`\`

### Codex Prompt Template

\`\`\`
RED-TEAM REVIEW of [ARTIFACT NAME].

You are a technical feasibility reviewer. Your job is to find protocol gaps,
math errors, race conditions, schema mismatches, and integration failures.

Here is the full artifact:
[INLINE COMPLETE FILE CONTENTS]

Review for:
1. Protocol gaps — Do message formats, handoff patterns, and APIs actually exist?
2. Config/schema correctness — Do field names, value types, and structures match?
3. Cost math — Are budget estimates calculated correctly?
4. Scaling/growth — Do files, logs, or state grow unbounded?
5. Race conditions — Can concurrent processes corrupt shared state?
6. Integration — Can the proposed tools/scripts actually be invoked as described?
7. Cold start — What happens on first run with no prior state?
8. Missing tooling — What scripts/tools are assumed but don't exist?

Format:
## Critical Issues
## Warnings
## Observations
\`\`\`

## Step 5 — Collect and Merge

### Error Handling

If one reviewer fails (timeout, crash, empty output, auth error):
1. Retry the failed reviewer once
2. If it fails again, proceed with the successful reviewer's findings
3. Note in the report header: "Single-model review — [reason]"

### Merge Rules

1. **Read both results** — Opus from the Agent result, Codex from the output file
2. **Classify each finding** into Critical / Warning / Observation (take the higher severity if reviewers disagree)
3. **Merge overlapping findings** — match on component + failure mode. Combine into one entry, note "Both" as source
4. **Preserve unique findings** — tag each with its source (Opus/Codex)
5. **Count totals** — N critical, N warnings, N observations

### Combined Report Format

\`\`\`markdown
## Combined Red-Team Report: [ARTIFACT NAME]

### Critical Issues (N — must fix)

| # | Issue | Opus | Codex | Fix |
|---|-------|:----:|:-----:|-----|
| C1 | [Description] | X | X | [Specific fix] |

### Warnings (N — should address)

| # | Issue | Opus | Codex | Fix |
|---|-------|:----:|:-----:|-----|
| W1 | [Description] | X | X | [Specific fix] |

### Observations (N — future consideration)

| # | Issue | Source |
|---|-------|--------|
| O1 | [Description] | Both/Opus/Codex |
\`\`\`

## Step 6 — Clean Up

\`\`\`bash
rm -f "/tmp/codex_redteam_${SESSION_ID}.txt"
\`\`\`

## When NOT to Use

- **Code review** — use a dedicated code-review skill instead
- **Quick sanity check** — use gemini-advisor or kimi-advisor for a second opinion
- **Trivial artifact** — a 10-line config change doesn't need dual-model adversarial review
- **Work in progress** — red-team complete artifacts, not drafts

Install

Claude Code

Save to your project's .claude/skills/ directory. Claude Code picks it up automatically.

Save to:
.claude/skills/red-team-review.skill.md
Or use the command line:
mkdir -p .claude/skills/ && curl -o .claude/skills/red-team-review.skill.md https://prompte.app/skill-shed/red-team-review/raw

Explore more skills

Browse the full library of curated skills for your AI coding CLI.