Paper Trading and Backtesting Infrastructure¶
This post covers the implementation of ADR-014 (Paper Trading and Backtesting Architecture) for the Arbiter-Bot statistical arbitrage engine.
The Problem¶
Before deploying capital, we need to validate strategies without financial risk. The challenges:
- Identical interfaces - Simulated execution must use the same traits as production
- Realistic fills - Mid-price fills are optimistic; real orders cross the book
- Deterministic replay - Same data must produce identical results
- Performance measurement - Sharpe ratio, max drawdown, win rate
Clock Abstraction¶
Time control is fundamental. The Clock trait abstracts over real and simulated time:
pub trait Clock: Send + Sync {
fn now(&self) -> DateTime<Utc>;
fn advance(&self, duration: Duration);
fn is_simulated(&self) -> bool;
}
RealClock wraps Utc::now(). SimulatedClock uses AtomicI64 for lock-free updates:
pub struct SimulatedClock {
nanos_since_epoch: AtomicI64,
}
impl SimulatedClock {
pub fn advance(&self, duration: Duration) {
let nanos = duration.num_nanoseconds().unwrap_or(i64::MAX);
self.nanos_since_epoch.fetch_add(nanos, Ordering::SeqCst);
}
}
Key design decisions:
- Monotonic guarantees:
advance()only moves forward - SeqCst ordering: Ensures visibility across threads
- Direct time setting:
set()for replay positioning
SimulatedExchangeClient¶
The client implements the existing ExchangeClient trait. Zero changes to strategy code:
#[async_trait]
impl ExchangeClient for SimulatedExchangeClient {
async fn place_order(&self, order: OrderRequest) -> Result<FillDetails, ExecutionError> {
// Optional latency injection
if let Some(latency) = self.config.simulated_latency {
tokio::time::sleep(latency).await;
}
// Match against order book
let result = self.matching_engine
.match_order(order.side, order.size, Some(order.price), fee_calculator)
.map_err(|e| ExecutionError::Rejected(e.to_string()))?;
Ok(FillDetails {
order_id: order.order_id,
venue_order_id: format!("sim-{}", Uuid::new_v4()),
price: result.average_price,
size: result.filled_quantity,
timestamp: self.clock.now(),
fee: result.fee,
})
}
}
Configuration includes:
- Fidelity level: Basic (mid-price) or Realistic (book crossing)
- Latency injection: Simulate network delays
- Fee models: Kalshi's 7% formula or Polymarket's 0%
MatchingEngine¶
Two fidelity levels address different use cases:
Level 1 - Basic: Instant fill at mid-price. Fast validation, optimistic assumptions.
fn match_basic(&self, quantity: f64, fee_calculator: impl Fn(f64, f64) -> f64) -> Result<MatchResult, MatchError> {
let mid = self.mid_price().ok_or(MatchError::NoLiquidity)?;
Ok(MatchResult {
filled_quantity: quantity,
average_price: mid,
fully_filled: true,
fills: vec![FillLeg { price: mid, quantity }],
fee: fee_calculator(mid, quantity),
})
}
Level 2 - Realistic: Crosses the order book with partial fills.
fn match_realistic(&self, side: OrderSide, quantity: f64, limit_price: Option<f64>, fee_calculator: impl Fn(f64, f64) -> f64) -> Result<MatchResult, MatchError> {
let levels = match side {
OrderSide::Buy => &orderbook.asks, // Buy crosses asks
OrderSide::Sell => &orderbook.bids, // Sell crosses bids
};
let mut remaining = quantity;
let mut fills = Vec::new();
for level in levels {
if remaining <= 0.0 { break; }
// Respect limit price
if let Some(limit) = limit_price {
let crosses = match side {
OrderSide::Buy => level.price <= limit,
OrderSide::Sell => level.price >= limit,
};
if !crosses { break; }
}
let fill_qty = remaining.min(level.size);
fills.push(FillLeg { price: level.price, quantity: fill_qty });
remaining -= fill_qty;
}
// ... calculate VWAP and fees
}
PositionTracker¶
Thread-safe position management with RwLock:
pub struct PositionTracker {
positions: RwLock<HashMap<MarketId, Position>>,
trades: RwLock<Vec<Trade>>,
limits: PositionLimits,
}
Each position tracks:
- Net size: Positive = long, negative = short
- Entry VWAP: Volume-weighted average price
- Realized PnL: Closed portion of position
- Unrealized PnL: Calculated from current market price
Position flip-through handles crossing from long to short (or vice versa):
// Example: Position is 100 long, sell 150
// -> Close 100 (realize PnL), open 50 short
if new_size.signum() != old_size.signum() {
// Calculate realized PnL for closed portion
let closed_pnl = old_size.abs() * (exit_price - entry_price) * side_multiplier;
// New position at current price
position.entry_price = current_price;
}
Position limits enforce risk controls before execution:
pub struct PositionLimits {
pub max_position_size: Decimal, // Per-market
pub max_open_positions: usize, // Portfolio-wide
pub max_notional_exposure: Decimal, // Total $
}
Historical Storage¶
SQLite-backed storage for trades and market data:
Schema optimizations:
- Indexes on
timestampandmarket_idfor efficient range queries - Decimal as TEXT: Preserves precision without floating point issues
- RFC3339 timestamps: Human-readable, sortable
Query patterns support backtesting needs:
pub fn query_market_data(
&self,
from: DateTime<Utc>,
to: DateTime<Utc>,
market_id: Option<&str>,
) -> Result<Vec<MarketDataRecord>, StorageError>
DataReplayer¶
Deterministic replay with event ordering:
pub struct DataReplayer {
storage: Arc<TradeStorage>,
clock: Arc<SimulatedClock>,
market_data: Vec<MarketDataRecord>, // Loaded upfront
current_index: usize,
}
Key behaviors:
- Upfront loading: All data loaded at construction for determinism
- Clock advancement: Each
next_event()sets clock to event timestamp - Seek/pause/reset: Full replay control
pub fn next_event(&mut self) -> Result<ReplayEvent, ReplayError> {
let event = self.market_data[self.current_index].clone();
self.clock.set(event.timestamp); // Advance simulated time
self.current_index += 1;
Ok(ReplayEvent::MarketData(event))
}
PerformanceMetrics¶
Standard financial metrics using rust_decimal for precision:
Sharpe Ratio: Risk-adjusted return
pub fn sharpe_ratio(&self) -> Result<Decimal, MetricsError> {
let mean_return = self.mean(&returns);
let std_dev = self.std_dev(&returns)?;
let excess_return = mean_return - (self.risk_free_rate / self.periods_per_year);
Ok((excess_return / std_dev) * self.periods_per_year.sqrt())
}
Max Drawdown: Largest peak-to-trough decline
pub fn max_drawdown(&self) -> Result<Decimal, MetricsError> {
let mut cumulative = dec!(1);
let mut peak = dec!(1);
let mut max_dd = dec!(0);
for trade in &self.trades {
cumulative = cumulative * (dec!(1) + trade.return_pct);
peak = peak.max(cumulative);
let drawdown = (peak - cumulative) / peak;
max_dd = max_dd.max(drawdown);
}
Ok(max_dd)
}
Trade Statistics: Win rate, profit factor, average P&L
Module Structure¶
arbiter-engine/src/
├── clock/
│ ├── mod.rs # Exports
│ └── clock.rs # Clock trait, RealClock, SimulatedClock
├── simulation/
│ ├── mod.rs # Exports
│ ├── client.rs # SimulatedExchangeClient
│ ├── config.rs # SimulationConfig, FidelityLevel
│ └── matching_engine.rs # Fill simulation
├── position/
│ ├── mod.rs # Exports
│ └── tracker.rs # PositionTracker, PnL
├── history/
│ ├── mod.rs # Exports
│ ├── storage.rs # SQLite storage
│ └── replayer.rs # DataReplayer
└── analytics/
├── mod.rs # Exports
└── metrics.rs # PerformanceMetrics
Test Coverage¶
65 new tests covering:
| Module | Tests |
|---|---|
| clock | 11 |
| simulation/client | 11 |
| simulation/config | 7 |
| simulation/matching_engine | 11 |
| position/tracker | 11 |
| history/storage | 10 |
| history/replayer | 10 |
| analytics/metrics | 14 |
Integration with Kalshi Demo¶
For the safest testing experience, combine paper trading with Kalshi's demo environment:
This provides real market data from Kalshi's demo environment (which mirrors production) while using simulated order execution. See ADR-015: Kalshi Demo Environment for details.
Future Work¶
- Level 3 fidelity: Queue position modeling for HFT
- Parquet export: Large-scale tick data analysis
- Multi-strategy comparison: A/B testing infrastructure
- Automated hyperparameter tuning: Grid search over strategy params