Get Started
Back to Skill Shed
Architecture

Codex Plan Review

Prompte27 February 2026IntermediateClaude Code
codexplan-reviewcross-aiblind-spotscode-reviewsecond-opinionmulti-agent

What This Skill Does

Uses two AI systems working together with multi-agent orchestration: Claude Code assesses plan complexity and selects the right Codex model, then Codex spawns parallel sub-agents — an explorer that reads actual repo files to validate feasibility, and an architecture reviewer that evaluates design, sequencing, and risk. Claude Code applies a two-layer noise filter to discard over-engineered suggestions and surface only critical findings. v2.0 includes automatic model fallback (gpt-5.4-codex to gpt-5.3-codex) and graceful degradation from multi-agent to single-shot mode.

When to Use It

  • Before executing any multi-step implementation plan that touches 3+ files
  • When you want a second opinion on architecture decisions
  • Before risky changes (data models, auth, migrations, cross-system work)
  • Say "review my plan", "second opinion", "blind spot check", or use /codex-plan-review

How It Works

1. You submit or finalize a plan

2. Claude Code evaluates complexity (LOW/MEDIUM/HIGH) and picks the right Codex model

3. Codex spawns parallel sub-agents: an explorer for code-reality validation and an architecture reviewer for design and risk analysis

4. Both sub-agents report findings, which are synthesized into a single verdict

5. Two-layer filtering removes noise — prompt-level filtering (~60% reduction) plus a safety net that deduplicates sub-agent echo

6. You get a clean report: critical findings with file:line evidence, or "plan is clean"

Requirements

  • Claude Code (Anthropic CLI)
  • Codex CLI >= 0.104.0 (npm install -g @openai/codex)
  • OpenAI API key (via codex login or environment variable)
  • macOS, Linux, or Windows via WSL
  • The multi_agent feature must be available (experimental as of March 2026) — skill auto-falls back to single-shot if unavailable

Source

Open source under the Unlicense — [github.com/Pricing-Logic/Claude-et-Codex](https://github.com/Pricing-Logic/Claude-et-Codex)

Skill File

codex-plan-review.skill.md
---
name: codex-plan-review
version: 2.0.0
description: "Cross-check implementation plans against Codex CLI using multi-agent review to catch blind spots before coding. Use this skill BEFORE executing any implementation plan — when you've drafted a plan for a new feature, a refactor, a bug fix, or any multi-file change. Also use when the user says things like 'plan review', 'check with codex', 'second opinion', 'blind spot check', 'review my plan', or 'cross-check this'. The whole point is to get a second AI perspective on architectural decisions and catch things Claude Code might miss, then filter the feedback to only what actually matters. v2.0 uses Codex's multi_agent feature to spawn parallel sub-agents for deeper code-reality validation and architecture review."
---

# Codex Plan Review — Multi-Agent Blind Spot Detection

You have a plan (or are about to finalize one). Before executing it, you're going to get a second opinion from OpenAI's Codex CLI to catch blind spots. Codex thinks differently than you do — it'll spot things you miss. But it also tends to over-engineer security and add unnecessary complexity. Your job is to be a smart filter: extract the critical insights, discard the noise.

**v2.0** uses Codex's `multi_agent` feature to spawn parallel sub-agents — an **explorer** that reads actual repo files to validate feasibility, and an **architecture reviewer** that evaluates design, sequencing, and risk — then synthesizes both into a single verdict. This produces significantly deeper reviews than single-shot mode.

**Requirements:** Codex CLI >= 0.104.0 (`codex -V` to check). Install with `npm install -g @openai/codex` if missing. The `multi_agent` feature must be available (experimental as of March 2026).

**Privacy note:** This skill gives Codex read-only access to project files in the working directory via `-C` and `--sandbox read-only`. For sensitive or proprietary repos, confirm with the user before proceeding.

**Platform:** macOS and Linux. Temp files use `/tmp/`. Windows users need WSL or equivalent.

## When This Triggers

- You've just written or finalized an implementation plan
- The user asks for a "plan review", "second opinion", or "blind spot check"
- Before executing a multi-step plan that touches 3+ files
- The user explicitly invokes `/codex-plan-review`

## The Workflow

### Step 0: Preflight Check

Before anything else, verify Codex is installed, meets the minimum version, and has multi_agent available:

```bash
# Ensure npm/node global bins are on PATH (Claude Code bash may not inherit full shell profile)
for p in "$HOME/.local/bin" "$HOME/.npm-global/bin" "/usr/local/bin"; do
  [ -d "$p" ] && case ":$PATH:" in *":$p:"*) ;; *) export PATH="$p:$PATH" ;; esac
done
for p in $(find "$HOME/.nvm/versions/node" -maxdepth 2 -name bin -type d 2>/dev/null); do
  case ":$PATH:" in *":$p:"*) ;; *) export PATH="$p:$PATH" ;; esac
done
CODEX_RAW=$(codex -V 2>/dev/null) || { echo "ERROR: Codex CLI not installed. Install with: npm install -g @openai/codex"; exit 1; }
CODEX_VERSION=$(echo "$CODEX_RAW" | grep -oE '[0-9]+\.[0-9]+\.[0-9]+')
if [ -z "$CODEX_VERSION" ]; then echo "ERROR: Could not parse Codex version from: $CODEX_RAW"; exit 1; fi
echo "Found: codex-cli $CODEX_VERSION"
MIN_VERSION="0.104.0"
if [ "$(printf '%s\n' "$MIN_VERSION" "$CODEX_VERSION" | sort -V | head -n1)" != "$MIN_VERSION" ]; then
  echo "WARNING: Codex CLI $CODEX_VERSION is below minimum $MIN_VERSION — multi_agent may not be available. Will attempt single-shot fallback."
fi
# Check if multi_agent feature is available
MULTI_AGENT_STATUS=$(codex features list 2>/dev/null | grep -E "^multi_agent" | awk '{print $NF}')
echo "multi_agent feature: ${MULTI_AGENT_STATUS:-unknown}"
```

If `codex exec` later fails with an authentication error, tell the user to run `codex login` or set the `OPENAI_API_KEY` environment variable.

### Step 1: Assess Complexity & Select Configuration

Before calling Codex, assess the plan's complexity to determine model, reasoning effort, and multi-agent concurrency. This should happen automatically — don't ask the user unless they've expressed a preference.

**Complexity signals:**

| Signal | LOW | MEDIUM | HIGH |
|--------|-----|--------|------|
| Files changed | 1-3 | 4-8 | 9+ |
| Scope | UI/styling, copy, config | New endpoints, logic changes, refactors | Data model, auth, migrations, cross-system |
| Risk | Reversible, no data impact | Could break features | Data loss, security, breaking changes |
| Dependencies | Self-contained | Touches shared code | Cross-package, external APIs |

**Model and multi-agent configuration by complexity:**

| Complexity | Model | Reasoning | Max Threads | Max Depth | Timeout (s) |
|------------|-------|-----------|-------------|-----------|-------------|
| LOW | spark | medium | 2 | 1 | 600 |
| MEDIUM | codex-5.3 | high | 3 | 1 | 900 |
| HIGH | codex-5.4 (fallback: codex-5.3) | xhigh | 4 | 1 | 1200 |

Rationale: Spark handles simple checklist-style reviews well. Codex-5.3 reasons more carefully about architecture and edge cases — use it for anything beyond trivial changes. Codex-5.4 (`gpt-5.4-codex`) is the strongest model for HIGH-complexity reviews when available, but it is not yet supported on all account types — the skill automatically falls back to `gpt-5.3-codex` if the model is unavailable (see model availability check below). `max_depth=1` prevents recursive sub-agent spawning (a safety guard). Thread count scales with complexity because higher-complexity plans benefit from more parallel exploration.

**Auto-escalation triggers:** If a LOW plan touches any of these, auto-upgrade to MEDIUM routing: auth/permissions, schema/data contracts, migrations, background jobs, concurrency, external API contracts, or unclear ownership boundaries.

**Note:** Model names evolve. If `gpt-5.3-codex-spark` or `gpt-5.3-codex` are no longer available, check `codex` docs for current model IDs and substitute accordingly. You can run `codex models list` to see which models are available on your account.

If the user explicitly requests a model or reasoning effort (e.g., "use o3", "use spark", "xhigh thinking", "high reasoning"), always respect that over the auto-selection. Valid reasoning effort values: `low`, `medium`, `high`, `xhigh`.

### Step 2: Prepare the Plan Document

Assemble a clear plan document. If you haven't already written one, draft one now using this minimum template:

```markdown
## Objective
[What are we building/changing and why]

## Key Files to Examine
[3-5 most important files Codex should read for context — these are the files
that contain the code most affected by or relevant to the plan]
- `path/to/file1.ts` — [why it matters]
- `path/to/file2.ts` — [why it matters]

## Files to Modify
[Every file path with a 1-2 line summary of changes]

## Files to Create
[Any new files with their purpose — omit if none]

## Architecture Decisions
[Key design choices and why]

## Constraints / Non-Goals
[What this plan explicitly does NOT do — helps Codex avoid suggesting out-of-scope work]

## Sequence
[What order to make changes]
```

The "Key Files to Examine" and "Constraints / Non-Goals" sections are important — they focus Codex on what matters and prevent it from wandering into irrelevant files or suggesting out-of-scope additions.

If a `writing-plans` skill is available, use it to produce the plan.

### Steps 3-5: Build Request, Call Codex (Multi-Agent), Read Output

These steps MUST run as a single bash command because Claude Code runs each command in a separate shell — variables like `$PLAN_FILE` would be lost between steps. Run the entire block below as one command.

Choose the configuration based on your Step 1 assessment:

**LOW complexity:**
```
-m gpt-5.3-codex-spark -c model_reasoning_effort="medium" -c agents.max_threads=2 -c agents.max_depth=1 -c agents.job_max_runtime_seconds=600
```

**MEDIUM complexity:**
```
-m gpt-5.3-codex -c model_reasoning_effort="high" -c agents.max_threads=3 -c agents.max_depth=1 -c agents.job_max_runtime_seconds=900
```

**HIGH complexity:**
```
-m gpt-5.4-codex -c model_reasoning_effort="xhigh" -c agents.max_threads=4 -c agents.max_depth=1 -c agents.job_max_runtime_seconds=1200
```
*(If `gpt-5.4-codex` is unavailable, fall back to `-m gpt-5.3-codex` — see model availability check in the bash block below.)*

```bash
# --- Ensure npm/node global bins are on PATH ---
for p in "$HOME/.local/bin" "$HOME/.npm-global/bin" "/usr/local/bin"; do
  [ -d "$p" ] && case ":$PATH:" in *":$p:"*) ;; *) export PATH="$p:$PATH" ;; esac
done
for p in $(find "$HOME/.nvm/versions/node" -maxdepth 2 -name bin -type d 2>/dev/null); do
  case ":$PATH:" in *":$p:"*) ;; *) export PATH="$p:$PATH" ;; esac
done

# --- Create unique temp files (mktemp without extension, then rename for .md) ---
PLAN_FILE=$(mktemp /tmp/codex-plan-XXXXXX) && mv "$PLAN_FILE" "${PLAN_FILE}.md" && PLAN_FILE="${PLAN_FILE}.md"
REQUEST_FILE=$(mktemp /tmp/codex-request-XXXXXX) && mv "$REQUEST_FILE" "${REQUEST_FILE}.md" && REQUEST_FILE="${REQUEST_FILE}.md"
OUTPUT_FILE=$(mktemp /tmp/codex-output-XXXXXX) && mv "$OUTPUT_FILE" "${OUTPUT_FILE}.md" && OUTPUT_FILE="${OUTPUT_FILE}.md"

# --- Model availability check ---
# For HIGH complexity, test if gpt-5.4-codex is available before committing to it.
# ChatGPT accounts may not have access; fall back to gpt-5.3-codex if so.
# Replace SELECTED_MODEL below with the model chosen in Step 1.
SELECTED_MODEL="gpt-5.3-codex"  # default; override to gpt-5.4-codex for HIGH complexity
# Uncomment the next line for HIGH complexity:
# SELECTED_MODEL="gpt-5.4-codex"
if [ "$SELECTED_MODEL" = "gpt-5.4-codex" ]; then
  echo "Testing model availability: $SELECTED_MODEL ..."
  MODEL_TEST=$(echo "Reply with OK" | codex exec --ephemeral -m "$SELECTED_MODEL" - 2>&1)
  MODEL_TEST_EXIT=$?
  if [ $MODEL_TEST_EXIT -ne 0 ] && echo "$MODEL_TEST" | grep -qE "(not supported|invalid.*model|400)"; then
    echo "WARNING: $SELECTED_MODEL is not available on this account (got: model not supported). Falling back to gpt-5.3-codex."
    SELECTED_MODEL="gpt-5.3-codex"
  elif [ $MODEL_TEST_EXIT -ne 0 ]; then
    echo "WARNING: Model test for $SELECTED_MODEL failed (exit $MODEL_TEST_EXIT). Falling back to gpt-5.3-codex."
    echo "Test output: $MODEL_TEST"
    SELECTED_MODEL="gpt-5.3-codex"
  else
    echo "Model $SELECTED_MODEL is available."
  fi
fi

# --- Write the plan ---
cat > "$PLAN_FILE" << 'PLAN_EOF'
[Your assembled plan goes here]
PLAN_EOF

# --- Build combined request (multi-agent instructions + plan in one file) ---
cat > "$REQUEST_FILE" << 'INSTRUCTIONS_EOF'
You are reviewing an implementation plan created by another AI assistant (Claude Code). Your job is to find CRITICAL blind spots — things that will cause bugs, data loss, race conditions, breaking changes, or architectural problems.

You have access to the project files in this directory.

## Mandatory Multi-Agent Workflow

You MUST follow this workflow — do not skip to a single-pass review:

1) Spawn an \`explorer\` sub-agent for code-reality validation.
   - Read all files listed under "Key Files to Examine" in the plan.
   - Read all files listed under "Files to Modify" in the plan.
   - Check imports, dependencies, and integration points in each file.
   - Output ONLY: concrete mismatches between the plan and actual code, missing dependencies, broken imports, incorrect API assumptions, and file-level risks. Include exact file paths and line numbers.

2) Spawn a second sub-agent for architecture and risk review.
   - Evaluate system boundaries, component coupling, and sequencing.
   - Check for migration/rollback risks, data consistency issues, and race conditions.
   - Verify the plan's sequence won't create intermediate broken states.
   - Check for missing env variables, config changes, or deployment steps.
   - Output ONLY: architectural risks, missing design decisions, sequencing problems, and safer alternatives. Be specific — cite plan sections.

3) Wait for both agents to complete, then synthesize their findings.

## Review Standards (apply to all agents)

DO NOT suggest:
- Minor style improvements
- Additional logging or monitoring
- Extra validation that isn't strictly necessary
- Security hardening beyond what the context requires
- Performance optimizations unless there's a clear bottleneck
- Additional abstractions or design patterns
- Type annotations or docstrings
- Error handling for impossible scenarios
- Anything listed under "Constraints / Non-Goals" in the plan

ONLY flag issues that would:
1. Cause runtime errors or crashes
2. Break existing functionality (check the actual code to verify)
3. Create data inconsistency or loss
4. Miss a required integration point (check imports, routes, configs)
5. Have incorrect assumptions about APIs, schemas, or data flow
6. Create race conditions or deadlocks
7. Violate constraints the plan doesn't account for
8. Miss a migration, env variable, or deployment step

## Output Format

Return EXACTLY this structure:

**Verdict:** APPROVE | APPROVE_WITH_CHANGES | BLOCK

**Critical Findings** (ordered by severity):
1. [Finding] — [Why it matters] — Evidence: \`file:line\`

**Medium Risks** (worth noting but not blocking):
- [Risk] — [Context]

**Missing Steps** (required additions to the plan):
- [Step] — [Why it's needed]

**Assumptions / Open Questions**:
- [Assumption that could not be verified] — [What to check]

Limit Critical Findings to the TOP 5 most severe. If nothing critical was found, say so explicitly.

---

The plan to review follows below.

---

INSTRUCTIONS_EOF
cat "$PLAN_FILE" >> "$REQUEST_FILE"

# --- Build single-shot fallback prompt (strips multi-agent instructions) ---
FALLBACK_FILE=$(mktemp /tmp/codex-fallback-XXXXXX) && mv "$FALLBACK_FILE" "${FALLBACK_FILE}.md" && FALLBACK_FILE="${FALLBACK_FILE}.md"
cat > "$FALLBACK_FILE" << 'FALLBACK_INSTRUCTIONS_EOF'
You are reviewing an implementation plan created by another AI assistant (Claude Code). Your job is to find CRITICAL blind spots — things that will cause bugs, data loss, race conditions, breaking changes, or architectural problems.

You have access to the project files in this directory. Prioritize reading files listed under "Key Files to Examine" in the plan, then check other referenced files as needed.

DO NOT suggest:
- Minor style improvements
- Additional logging or monitoring
- Extra validation that isn't strictly necessary
- Security hardening beyond what the context requires
- Performance optimizations unless there's a clear bottleneck
- Additional abstractions or design patterns
- Type annotations or docstrings
- Error handling for impossible scenarios
- Anything listed under "Constraints / Non-Goals" in the plan

ONLY flag issues that would:
1. Cause runtime errors or crashes
2. Break existing functionality (check the actual code to verify)
3. Create data inconsistency or loss
4. Miss a required integration point (check imports, routes, configs)
5. Have incorrect assumptions about APIs, schemas, or data flow
6. Create race conditions or deadlocks
7. Violate constraints the plan doesn't account for
8. Miss a migration, env variable, or deployment step

Return EXACTLY this structure:

**Verdict:** APPROVE | APPROVE_WITH_CHANGES | BLOCK

**Critical Findings** (ordered by severity):
1. [Finding] — [Why it matters] — Evidence: \`file:line\`

**Medium Risks** (worth noting but not blocking):
- [Risk] — [Context]

**Missing Steps** (required additions to the plan):
- [Step] — [Why it's needed]

**Assumptions / Open Questions**:
- [Assumption that could not be verified] — [What to check]

Limit Critical Findings to the TOP 5 most severe. If nothing critical was found, say so explicitly.

---

The plan to review follows below.

---

FALLBACK_INSTRUCTIONS_EOF
cat "$PLAN_FILE" >> "$FALLBACK_FILE"

# --- Call Codex with multi-agent enabled (adjust model flags per Step 1) ---
# NOTE: Do NOT use --full-auto — it overrides --sandbox read-only with workspace-write.
# In exec mode, approval defaults to 'never', so --full-auto is unnecessary.
# $SELECTED_MODEL is set by the availability check above (gpt-5.4-codex or gpt-5.3-codex for HIGH,
# gpt-5.3-codex for MEDIUM, gpt-5.3-codex-spark for LOW). Adjust reasoning/threads per Step 1.
cat "$REQUEST_FILE" | codex exec \
  --ephemeral \
  --enable multi_agent \
  --sandbox read-only \
  -m "$SELECTED_MODEL" \
  -c model_reasoning_effort="high" \
  -c agents.max_threads=3 \
  -c agents.max_depth=1 \
  -c agents.job_max_runtime_seconds=900 \
  -C "$(pwd)" \
  -o "$OUTPUT_FILE" \
  -

CODEX_EXIT=$?

# --- Check for empty output (silent false-positive guard) ---
if [ $CODEX_EXIT -eq 0 ] && [ ! -s "$OUTPUT_FILE" ]; then
  echo "WARNING: Codex exited successfully but produced no output. Treating as failure."
  CODEX_EXIT=1
fi

# --- Check for multi-agent failure and fallback to single-shot ---
if [ $CODEX_EXIT -ne 0 ]; then
  echo "WARNING: Multi-agent review failed (exit code $CODEX_EXIT). Falling back to single-shot mode..."
  # Use the fallback prompt that doesn't reference sub-agent spawning
  cat "$FALLBACK_FILE" | codex exec \
    --full-auto \
    --ephemeral \
    --sandbox read-only \
    -m "$SELECTED_MODEL" \
    -c model_reasoning_effort="high" \
    -C "$(pwd)" \
    -o "$OUTPUT_FILE" \
    -
  CODEX_EXIT=$?
  if [ $CODEX_EXIT -eq 0 ] && [ ! -s "$OUTPUT_FILE" ]; then
    echo "WARNING: Codex exited successfully but produced no output. Treating as failure."
    CODEX_EXIT=1
  fi
  if [ $CODEX_EXIT -ne 0 ]; then
    echo "ERROR: Codex review failed in both multi-agent and single-shot mode (exit code $CODEX_EXIT)."
    echo "Possible causes:"
    echo "  - Auth: run 'codex login' or set OPENAI_API_KEY"
    echo "  - Repo trust: run 'codex' in this directory once to trust it"
    echo "  - Model unavailable: check 'codex' docs for current model IDs"
    echo "  - Not a git repo: try adding --skip-git-repo-check"
    rm -f "$PLAN_FILE" "$REQUEST_FILE" "$FALLBACK_FILE" "$OUTPUT_FILE"
    exit 1
  fi
  echo "NOTE: Review completed in single-shot fallback mode (multi_agent was unavailable)."
fi

# --- Read and display the output ---
echo "=== CODEX REVIEW OUTPUT ==="
cat "$OUTPUT_FILE"
echo "=== END OUTPUT ==="

# --- Cleanup ---
rm -f "$PLAN_FILE" "$REQUEST_FILE" "$FALLBACK_FILE" "$OUTPUT_FILE"
```

Set the Bash tool timeout to `600000` (10 minutes) when running this command — multi-agent reviews take longer than single-shot. If it times out, clean up orphaned files with: `rm -f /tmp/codex-plan-*.md /tmp/codex-request-*.md /tmp/codex-output-*.md`

**Important notes on the Codex call:**
- `--enable multi_agent` activates Codex's sub-agent orchestration (spawns parallel worker threads)
- `--sandbox read-only` restricts Codex to reading files only — appropriate for a review task. Use `--sandbox workspace-write` only if you need Codex to run build/test commands
- Do NOT use `--full-auto` — it silently overrides `--sandbox read-only` with `workspace-write`. In exec mode, approval is already set to `never` by default, so `--full-auto` is unnecessary
- `--ephemeral` avoids polluting session history with review artifacts
- `-C "$(pwd)"` gives Codex access to the project files so it can verify assumptions against actual code
- `-o` (`--output-last-message`) captures Codex's final synthesized response to a file for reliable reading
- `-c agents.max_threads=N` controls how many sub-agents can run in parallel
- `-c agents.max_depth=1` prevents sub-agents from spawning their own sub-agents (safety guard)
- `-c agents.job_max_runtime_seconds=N` sets per-worker timeout to prevent runaway agents
- The prompt explicitly instructs Codex to spawn two sub-agents — this is the first layer of control. Codex's multi_agent orchestrator handles the actual spawning, waiting, and result merging
- The fallback logic catches cases where multi_agent is unavailable (older Codex versions, feature disabled, or runtime errors) and retries as a standard single-shot review
- The skill always explicitly selects a model — never defers to user config defaults — for deterministic review quality

**If Codex fails with an auth error:** Tell the user to run `codex login` or set `OPENAI_API_KEY`, then retry.

### Step 6: Filter — The Safety Net

Even with multi-agent review, Codex has a well-known tendency to:

- **Over-securitize**: Adding auth checks, input validation, and error handling everywhere, even for internal functions that receive trusted data. If the plan is for an internal tool or a prototype, most security suggestions are noise.
- **Over-abstract**: Suggesting interfaces, factories, and extra layers for things that don't need them yet. One concrete implementation beats a premature abstraction.
- **Scope creep**: Suggesting adjacent features, "while you're at it" additions, or "best practice" extras that expand the scope beyond what was asked.
- **Framework orthodoxy**: Insisting on patterns that a framework supports but the project doesn't use. If the codebase has its own conventions, those win.
- **Sub-agent echo**: With multi-agent mode, both sub-agents may flag the same issue from different angles. Deduplicate findings — present each issue once with the strongest evidence.

**Your filter criteria — only keep findings that are:**

| Keep | Discard |
|------|---------|
| Will cause a runtime error | "Consider adding" suggestions |
| Breaks existing functionality | Additional error handling for unlikely cases |
| Data loss or corruption risk | Extra validation on trusted internal data |
| Missing required step (migration, env var, etc.) | Security hardening beyond the threat model |
| Wrong assumption about an API or schema | Architectural preferences/patterns |
| Race condition or state management bug | Performance suggestions without evidence |
| Forgotten dependency or import | Additional logging/monitoring |
| Duplicate findings from sub-agents | Repeated issues already covered |

### Step 7: Present Findings to the User

Present the filtered results in this format:

---

**Codex Plan Review — Multi-Agent Blind Spot Report**

**Mode:** Multi-agent (explorer + architecture reviewer) | *or* Single-shot fallback
**Model used:** [model name + reasoning effort]
**Complexity assessed:** [LOW / MEDIUM / HIGH — with brief rationale]
**Plan reviewed:** [brief description]

**Verdict:** APPROVE | APPROVE_WITH_CHANGES | BLOCK

**Critical Findings** (integrate these before executing):
1. [Finding] — [Why it matters and what to change] — *Evidence: `file:line`*
2. [Finding] — [Why it matters and what to change] — *Evidence: `file:line`*

**Medium Risks** (awareness, may need action):
- [Risk] — [Context]

**Missing Steps** (add to plan before executing):
- [Step] — [Why needed]

**Noted but Non-Critical** (filtered out, awareness only):
- [Brief note if anything was borderline worth knowing]

---

If Codex found nothing critical, say so clearly: "Codex multi-agent review came back clean — no critical blind spots detected. Both explorer and architecture sub-agents validated the plan. Good to execute."

### Step 8: Integrate and Proceed

If there were critical findings or a BLOCK verdict:
1. Update the plan to address each critical finding
2. Show the user what changed and why
3. Ask if they want to re-review the updated plan or proceed

If the verdict was APPROVE or APPROVE_WITH_CHANGES with no critical findings:
1. Proceed directly to execution
2. Note in the plan that it was cross-checked via multi-agent review

## Edge Cases

**Codex is unavailable or errors out:**
- Preflight check (Step 0) catches installation and version issues early
- If `codex exec` fails with an auth error, tell the user to run `codex login` or set `OPENAI_API_KEY`
- If multi-agent mode fails, the skill automatically falls back to single-shot review
- If both modes fail, show the error and ask if the user wants to proceed without review

**Multi-agent not available:**
- If Codex version is below 0.104.0 or `multi_agent` feature is not enabled, the fallback runs automatically
- The single-shot fallback uses a separate prompt without multi-agent instructions (avoids impossible "spawn sub-agents" directives)
- Report to user: "Ran in single-shot fallback mode — multi_agent was unavailable"

**Codex takes too long:**
- The Bash tool timeout (600000ms / 10 minutes) will kill the process — multi-agent reviews need more time than single-shot
- Individual sub-agents are bounded by `job_max_runtime_seconds` (configurable per complexity tier)
- If it times out, report it and clean up: `rm -f /tmp/codex-plan-*.md /tmp/codex-request-*.md /tmp/codex-output-*.md`

**Codex returns only non-critical noise:**
- This is fine and expected. Report: "Codex multi-agent review complete — no critical issues found. All suggestions were non-critical (security hardening, additional validation) and filtered out."

**Sub-agent echo / duplicates:**
- Multi-agent mode can produce duplicate findings from both sub-agents. Always deduplicate in Step 6 before presenting to the user.

**User wants to use a different model:**
- Always respect explicit user model requests: `codex exec -m [model] ...`
- Common options: `o3`, `o4-mini`, `gpt-5.3-codex-spark`, `gpt-5.3-codex`, `gpt-5.4-codex`
- Model availability varies by account type (ChatGPT vs API). Some models (e.g., `gpt-5.4-codex`) may not be available on ChatGPT accounts. The skill's model availability check will automatically detect this and fall back to `gpt-5.3-codex` with a warning.
- To check which models your account can access, run `codex models list` (or `codex models` depending on CLI version)

**User wants single-shot mode:**
- If the user says "skip multi-agent", "single shot", or "no sub-agents", remove the `--enable multi_agent` flag and agent config overrides from the `codex exec` call. The rest of the workflow remains the same.

**Legacy `collab` configuration:**
- The `multi_agent` feature was previously called `collab`. If you see `[collab]` in a user's `~/.codex/config.toml`, advise them to rename it to `[features]` with `multi_agent = true` to avoid deprecation warnings. The CLI flag is `--enable multi_agent` (not `--enable collab`).

Install

Claude Code

Save to your project's .claude/skills/ directory. Claude Code picks it up automatically.

Save to:
.claude/skills/codex-plan-review.skill.md
Or use the command line:
mkdir -p .claude/skills/ && curl -o .claude/skills/codex-plan-review.skill.md https://prompte.app/skill-shed/codex-plan-review/raw

Explore more skills

Browse the full library of curated skills for your AI coding CLI.