Production tier · Reference card · API Academy

Production.
One sheet.

Market-making spread + skew, the triangular-arb formula, signal-score normalisation, the deploy gate every production bot crosses, kill-switch shapes, and the recovery decision tree. Pin it; print it; come back to it.

Back to Module 18

MM quote model

M15

bid = mid − spread/2 − skew ask = mid + spread/2 − skew skew = k · (inventory / cap)

Spread covers risk + fees. Skew nudges quotes against your inventory: long → lower bid, lower ask (encourage selling).

Defaults. spread = 2×fee + 1σ_1m; k = 0.3 at |inv| = cap.

Triangular arb

M16

For mutually exclusive markets that should sum to 1:

edge = 1 − Σ ask_i − round_trip_cost

If Σ ask < 1 − cost, buy all sides; payout is exactly 1 share at settlement.

Caveat. Liquidity has to support all legs simultaneously. Walk depth, fill IOC; if any leg misses, immediately hedge the others.

Signal scoring

M17

z = (x − μ_rolling) / σ_rolling score = clip(z, −3, +3)

Why z-score. Lets you combine signals on different scales (volume, sentiment, momentum) without one drowning out the others.

Combine. Weighted sum, then re-z-score the combined signal so risk caps stay scale-invariant.

Latency budget

M18

Hop	Typical
WS event → agent	5–30 ms
Agent decide	5–100 ms
Sign + submit	20–80 ms
Exchange ack	50–200 ms
Total RTT (p50)	~200 ms

If your strategy needs < 50ms RTT, you’re competing with infra you don’t have. Pick a slower-edge strategy.

Kill switches

M18

Three layers, every production bot needs all three:

Layer	Trigger	Effect
Hard cap	Per-trade size, per-day loss, position limit	Refuse new orders; hold existing.
Drawdown	NAV drop > threshold	Cancel resting orders, flatten gradually, alert.
Manual flag	File `state/KILL` exists	Hot loop checks at top of every iteration; refuses to act.

Rule. Risk caps must live in code, NOT in the agent prompt. The agent obeys; risk doesn’t depend on it agreeing.

Deploy gate

M18

OOS Sharpe > 1. On data the strategy never saw.
Paper for 2 weeks. Live data, no real money. Diff against backtest.
Reconciliation hard-stop wired. Chain ≠ REST → halt.
Kill switch verified. Touch state/KILL, watch the bot stop within 1 cycle.
Alerting on. Page on PnL spike, latency spike, error spike, cost spike.
Run-book exists. 1-page doc: how to stop, how to flatten, who to call.

Recovery tree

M18

Bot crashed. Restart from atomic-saved state. Reconcile chain. Replay last 60s of trace before resuming.
State file corrupt. Restore from atomic .tmp if present, else last hourly snapshot. Halt; reconcile by hand.
Chain ≠ REST. Halt new orders. Source of truth = chain. Update state, then resume.
Daily loss cap hit. Auto-flatten. Don’t restart same day.
Latency p95 doubled. Reduce order rate by 50%. If it persists, halt.

Production pitfalls

Cross-module

Slippage tolerance drift. Tightening slippage caps to chase backtest numbers means orders silently fail in volatile regimes. Re-fit slippage budget per regime.
Reconciliation skipped because “it always passes.” When it fails, you find out hours later.
Risk caps in the prompt. The model can talk itself out of them. Caps belong in place_order, not in the system prompt.
Resting orders not cancelled on disconnect. WS dies, your last quote is still on the book. Always cancel-on-disconnect (CoD).
Single-key compromise drains everything. Sign with a hot key that has narrow risk caps; cold key for moving funds.
Logs without alerting. A trace nobody reads is a trace that doesn’t exist. Page on PnL spike, error spike, cost spike.