Production tier · Reference card · Agents Academy

Production.
One sheet.

Deploy gate, the metrics every running agent emits, kill-switch shapes, prompt-injection defense layers, the trace-replay audit pattern, and the recovery decision tree. Pin it; print it; come back to it.

Back to Module 16

Deploy gate

M11

Tests green. Unit + integration + 1 replay test. See card 04.
Risk caps in code, not prompt. Verify place_limit_order rejects oversize even if prompt says yes.
Kill switch verified. Touch $ACADEMY_DATA_DIR/kill_switch.flag, watch loop refuse to act within 1 cycle.
Trace redaction tested. Run with a fake API key; confirm it doesn’t appear in $ACADEMY_DATA_DIR/state/traces/.
Health monitor wired. Cost / latency / decision-rate / error-rate page on threshold.
Cancel-on-disconnect on. Drop the WS; resting orders are removed exchange-side.
Run-book exists. 1 page: how to stop, how to flatten, who to call. Pinned in the team channel.

Monitoring metrics

M12

Metric	Page on
cost_per_run	2× baseline
tool_calls_per_run	> 30
latency_p95	2× baseline
error_rate_5m	> 1%
kill_switch_active	any
chain_rest_drift	> dust

Test types

M13

Unit	Each tool, in isolation, with mocked deps
Integration	Loop + tools against testnet / mock exchange
Replay	Re-run an NDJSON trace; assert decisions match
Adversarial	Inject prompt-injection payloads; confirm no escape
Smoke	30s live run on testnet; confirm zero crashes

Replay audit

M13

// Daily cron 1. read state/traces/yesterday 2. for each runId: - replay against same model + tools (mocked) - assert decisions match - flag drift > 0 3. write replay-report.md

Catches model-version drift, prompt regressions, and silent strategy breakage.

Kill switch shapes

M14

Shape	Trigger	Effect
File flag	`touch $ACADEMY_DATA_DIR/kill_switch.flag` (or Module 02 panel tap)	Loop top check; refuse to act in next iteration
Daily-loss	NAV drop > 3%	Cancel resting, flatten, halt 24h
Cost cap	$X/day spent on inference	Halt; alert
Tool error rate	> 5% over 5m	Halt; alert
Chain drift	REST ≠ chain > dust	Halt; reconcile by hand

All five wire to the SAME halt path, cancel resting, flag $ACADEMY_DATA_DIR/kill_switch.flag, alert. One halt path = one thing to test.

Prompt-injection defense

M15

Layer 1. Sanitise tool output before feeding back, strip control sequences, <script>, “ignore previous instructions”.
Layer 2. Risk caps in code, not prompt. The model can’t talk its way past maxSize in place_limit_order.
Layer 3. Allowlist tools per loop. The agent that browses markets shouldn’t have place_limit_order available.
Layer 4. Human approval for new market types / oversize trades.

Recovery tree

M16

Crash mid-run. Restart from atomic state. Replay last 60s of trace before resuming.
Hung tool. 30s timeout aborts. Skip iteration; alert.
Cost spike. Halt. Inspect last trace for runaway loop.
Suspicious output. Halt. Rotate any key that appeared in trace.
Chain drift. Halt new orders. Source of truth = chain.

Production pitfalls

Cross-module

Runaway loops. Agent gets stuck calling tools forever. Always set maxSteps; alert on cap.
Kill-switch race conditions. Loop checks $ACADEMY_DATA_DIR/kill_switch.flag at the top, but the order is already in-flight. Cancel-on-disconnect + idempotent retries close the gap.
Prompt injection via tool output. A scraped market description containing “ignore previous instructions”. Sanitise.
Monitoring gaps that hide bad behavior. If you only watch P&L, the agent can lose money slowly and pass cost caps. Watch decision rate, error rate, latency.
Trace logs leak secrets. Pre-redact; keep $ACADEMY_DATA_DIR/state/traces/ off the deploy image and out of shared backups.
State drift between dev and prod. Two writers, atomic-rename isn’t enough. Add flock; only one process holds the file.
Model drift. Same prompt, different decisions next month. Replay audit catches it; freeze model version in config.