Production tier · Reference card · API Academy

Production.
One sheet.

Market-making spread + skew, the triangular-arb formula, signal-score normalisation, the deploy gate every production bot crosses, kill-switch shapes, and the recovery decision tree. Pin it; print it; come back to it.

01

MM quote model

M15
bid = mid − spread/2 − skew ask = mid + spread/2 − skew skew = k · (inventory / cap)

Spread covers risk + fees. Skew nudges quotes against your inventory: long → lower bid, lower ask (encourage selling).

Defaults. spread = 2×fee + 1σ_1m; k = 0.3 at |inv| = cap.

02

Triangular arb

M16

For mutually exclusive markets that should sum to 1:

edge = 1 − Σ ask_i − round_trip_cost

If Σ ask < 1 − cost, buy all sides; payout is exactly 1 share at settlement.

Caveat. Liquidity has to support all legs simultaneously. Walk depth, fill IOC; if any leg misses, immediately hedge the others.

03

Signal scoring

M17
z = (x − μ_rolling) / σ_rolling score = clip(z, −3, +3)

Why z-score. Lets you combine signals on different scales (volume, sentiment, momentum) without one drowning out the others.

Combine. Weighted sum, then re-z-score the combined signal so risk caps stay scale-invariant.

04

Latency budget

M18
HopTypical
WS event → agent5–30 ms
Agent decide5–100 ms
Sign + submit20–80 ms
Exchange ack50–200 ms
Total RTT (p50)~200 ms

If your strategy needs < 50ms RTT, you’re competing with infra you don’t have. Pick a slower-edge strategy.

05

Kill switches

M18

Three layers, every production bot needs all three:

LayerTriggerEffect
Hard capPer-trade size, per-day loss, position limitRefuse new orders; hold existing.
DrawdownNAV drop > thresholdCancel resting orders, flatten gradually, alert.
Manual flagFile state/KILL existsHot loop checks at top of every iteration; refuses to act.

Rule. Risk caps must live in code, NOT in the agent prompt. The agent obeys; risk doesn’t depend on it agreeing.

06

Deploy gate

M18
  1. OOS Sharpe > 1. On data the strategy never saw.
  2. Paper for 2 weeks. Live data, no real money. Diff against backtest.
  3. Reconciliation hard-stop wired. Chain ≠ REST → halt.
  4. Kill switch verified. Touch state/KILL, watch the bot stop within 1 cycle.
  5. Alerting on. Page on PnL spike, latency spike, error spike, cost spike.
  6. Run-book exists. 1-page doc: how to stop, how to flatten, who to call.
07

Recovery tree

M18
  • Bot crashed. Restart from atomic-saved state. Reconcile chain. Replay last 60s of trace before resuming.
  • State file corrupt. Restore from atomic .tmp if present, else last hourly snapshot. Halt; reconcile by hand.
  • Chain ≠ REST. Halt new orders. Source of truth = chain. Update state, then resume.
  • Daily loss cap hit. Auto-flatten. Don’t restart same day.
  • Latency p95 doubled. Reduce order rate by 50%. If it persists, halt.
08

Production pitfalls

Cross-module
  1. Slippage tolerance drift. Tightening slippage caps to chase backtest numbers means orders silently fail in volatile regimes. Re-fit slippage budget per regime.
  2. Reconciliation skipped because “it always passes.” When it fails, you find out hours later.
  3. Risk caps in the prompt. The model can talk itself out of them. Caps belong in place_order, not in the system prompt.
  4. Resting orders not cancelled on disconnect. WS dies, your last quote is still on the book. Always cancel-on-disconnect (CoD).
  5. Single-key compromise drains everything. Sign with a hot key that has narrow risk caps; cold key for moving funds.
  6. Logs without alerting. A trace nobody reads is a trace that doesn’t exist. Page on PnL spike, error spike, cost spike.