Data.
One sheet.
Historical-data shapes, the backtest event loop, the cost model that makes a backtest honest, Sharpe / drawdown / Kelly formulas, and the bias checklist that decides whether your numbers mean anything. Pin it; print it; come back to it.
| Granularity | Use for |
|---|---|
| Tick (every fill) | execution audit, slippage |
| 1m / 5m / 1h bars | strategy signals |
| Daily snapshots | regime detection |
| Order-book L1 (BBO) | fast signals, market-make |
| Order-book L2 (depth) | impact / liquidity studies |
TODO: confirm Limitless historical-endpoint shapes and pagination.
One pass, time-ordered. Never read future data. Strategies see prices in the order they happened, with no peek.
Daily → annualised: multiply by sqrt(252). Hourly: sqrt(252·24).
Targets. Sharpe > 1.5 is good, > 2 is great, > 3 is suspicious. MaxDD < 20% for retail.
Every fill in the backtest pays:
cost = spread/2 // half the bid-ask + slippage(size,depth) // book-walk + fee_bps × notional // exchange fee + funding_per_periodDefault slippage model. Walk the L2 book against your size; if no L2, use k × sqrt(size / median_depth) with k ≈ 0.3.
If you skip costs, your backtest looks 2–5× better than reality.
| Cap | Default |
|---|---|
| Per-trade size | ≤ 5% bankroll |
| Open positions | ≤ 30% bankroll |
| Daily loss | ≤ 3% bankroll |
| Correlated cluster | ≤ 10% per cluster |
| In-sample (IS) | ~70% of data, fit + tune |
| Out-of-sample (OOS) | ~30% held out, no fitting |
| Walk-forward | roll IS forward, retest OOS each step |
Rule. Look at OOS performance ONCE. If you tune to OOS results, OOS becomes IS, you have no holdout left.
- Look-ahead bias. Did your signal use any data the live strategy wouldn’t have? Forward-fill / reindex with
method='ffill'NEVER passes future values backward. - Survivorship bias. Are you only backtesting markets that exist today? Closed / delisted markets have to be in the dataset.
- Selection bias. Did you pick the time window because it worked? Test on adjacent windows.
- Optimisation bias / overfit. Did you tune so many knobs the backtest is fitting noise? Cap parameters at < sqrt(N_trades).
- Cost-skip bias. Did you include fee + spread + slippage at fill time? See card 04.
- Restart bias. Did you stop running variants that didn’t work and only keep the survivors? Track every variant; report the median, not the max.
Is this backtest real?
Cross-module- Costs included? If no → not real. Apply fees + spread + slippage at every fill.
- Look-ahead checked? Walk the code. Any future data slips in → not real.
- OOS untouched after first look? If you re-tuned to OOS → not real. Cut a fresh OOS slice.
- Sharpe believable? > 4 with realistic costs is suspicious. Check for survivorship + optimisation bias.
- MaxDD believable? < 5% on a multi-year window is suspicious. You probably skipped a bad regime.
- Tested on adjacent windows? If only one window works, it’s probably noise.
- All boxes ticked? Run paper for 2 weeks before real money. Backtests lie; live data doesn’t.