Production.
One sheet.
Market-making spread + skew, the triangular-arb formula, signal-score normalisation, the deploy gate every production bot crosses, kill-switch shapes, and the recovery decision tree. Pin it; print it; come back to it.
Spread covers risk + fees. Skew nudges quotes against your inventory: long → lower bid, lower ask (encourage selling).
Defaults. spread = 2×fee + 1σ_1m; k = 0.3 at |inv| = cap.
For mutually exclusive markets that should sum to 1:
edge = 1 − Σ ask_i − round_trip_costIf Σ ask < 1 − cost, buy all sides; payout is exactly 1 share at settlement.
Caveat. Liquidity has to support all legs simultaneously. Walk depth, fill IOC; if any leg misses, immediately hedge the others.
Why z-score. Lets you combine signals on different scales (volume, sentiment, momentum) without one drowning out the others.
Combine. Weighted sum, then re-z-score the combined signal so risk caps stay scale-invariant.
| Hop | Typical |
|---|---|
| WS event → agent | 5–30 ms |
| Agent decide | 5–100 ms |
| Sign + submit | 20–80 ms |
| Exchange ack | 50–200 ms |
| Total RTT (p50) | ~200 ms |
If your strategy needs < 50ms RTT, you’re competing with infra you don’t have. Pick a slower-edge strategy.
Three layers, every production bot needs all three:
| Layer | Trigger | Effect |
|---|---|---|
| Hard cap | Per-trade size, per-day loss, position limit | Refuse new orders; hold existing. |
| Drawdown | NAV drop > threshold | Cancel resting orders, flatten gradually, alert. |
| Manual flag | File state/KILL exists | Hot loop checks at top of every iteration; refuses to act. |
Rule. Risk caps must live in code, NOT in the agent prompt. The agent obeys; risk doesn’t depend on it agreeing.
- OOS Sharpe > 1. On data the strategy never saw.
- Paper for 2 weeks. Live data, no real money. Diff against backtest.
- Reconciliation hard-stop wired. Chain ≠ REST → halt.
- Kill switch verified. Touch
state/KILL, watch the bot stop within 1 cycle. - Alerting on. Page on PnL spike, latency spike, error spike, cost spike.
- Run-book exists. 1-page doc: how to stop, how to flatten, who to call.
- Bot crashed. Restart from atomic-saved state. Reconcile chain. Replay last 60s of trace before resuming.
- State file corrupt. Restore from atomic
.tmpif present, else last hourly snapshot. Halt; reconcile by hand. - Chain ≠ REST. Halt new orders. Source of truth = chain. Update state, then resume.
- Daily loss cap hit. Auto-flatten. Don’t restart same day.
- Latency p95 doubled. Reduce order rate by 50%. If it persists, halt.
Production pitfalls
Cross-module- Slippage tolerance drift. Tightening slippage caps to chase backtest numbers means orders silently fail in volatile regimes. Re-fit slippage budget per regime.
- Reconciliation skipped because “it always passes.” When it fails, you find out hours later.
- Risk caps in the prompt. The model can talk itself out of them. Caps belong in
place_order, not in the system prompt. - Resting orders not cancelled on disconnect. WS dies, your last quote is still on the book. Always cancel-on-disconnect (CoD).
- Single-key compromise drains everything. Sign with a hot key that has narrow risk caps; cold key for moving funds.
- Logs without alerting. A trace nobody reads is a trace that doesn’t exist. Page on PnL spike, error spike, cost spike.