Skip to content

Paper Trading and Backtesting Infrastructure

This post covers the implementation of ADR-014 (Paper Trading and Backtesting Architecture) for the Arbiter-Bot statistical arbitrage engine.

The Problem

Before deploying capital, we need to validate strategies without financial risk. The challenges:

  1. Identical interfaces - Simulated execution must use the same traits as production
  2. Realistic fills - Mid-price fills are optimistic; real orders cross the book
  3. Deterministic replay - Same data must produce identical results
  4. Performance measurement - Sharpe ratio, max drawdown, win rate

Clock Abstraction

Time control is fundamental. The Clock trait abstracts over real and simulated time:

pub trait Clock: Send + Sync {
    fn now(&self) -> DateTime<Utc>;
    fn advance(&self, duration: Duration);
    fn is_simulated(&self) -> bool;
}

RealClock wraps Utc::now(). SimulatedClock uses AtomicI64 for lock-free updates:

pub struct SimulatedClock {
    nanos_since_epoch: AtomicI64,
}

impl SimulatedClock {
    pub fn advance(&self, duration: Duration) {
        let nanos = duration.num_nanoseconds().unwrap_or(i64::MAX);
        self.nanos_since_epoch.fetch_add(nanos, Ordering::SeqCst);
    }
}

Key design decisions:

  • Monotonic guarantees: advance() only moves forward
  • SeqCst ordering: Ensures visibility across threads
  • Direct time setting: set() for replay positioning

SimulatedExchangeClient

The client implements the existing ExchangeClient trait. Zero changes to strategy code:

#[async_trait]
impl ExchangeClient for SimulatedExchangeClient {
    async fn place_order(&self, order: OrderRequest) -> Result<FillDetails, ExecutionError> {
        // Optional latency injection
        if let Some(latency) = self.config.simulated_latency {
            tokio::time::sleep(latency).await;
        }

        // Match against order book
        let result = self.matching_engine
            .match_order(order.side, order.size, Some(order.price), fee_calculator)
            .map_err(|e| ExecutionError::Rejected(e.to_string()))?;

        Ok(FillDetails {
            order_id: order.order_id,
            venue_order_id: format!("sim-{}", Uuid::new_v4()),
            price: result.average_price,
            size: result.filled_quantity,
            timestamp: self.clock.now(),
            fee: result.fee,
        })
    }
}

Configuration includes:

  • Fidelity level: Basic (mid-price) or Realistic (book crossing)
  • Latency injection: Simulate network delays
  • Fee models: Kalshi's 7% formula or Polymarket's 0%

MatchingEngine

Two fidelity levels address different use cases:

Level 1 - Basic: Instant fill at mid-price. Fast validation, optimistic assumptions.

fn match_basic(&self, quantity: f64, fee_calculator: impl Fn(f64, f64) -> f64) -> Result<MatchResult, MatchError> {
    let mid = self.mid_price().ok_or(MatchError::NoLiquidity)?;
    Ok(MatchResult {
        filled_quantity: quantity,
        average_price: mid,
        fully_filled: true,
        fills: vec![FillLeg { price: mid, quantity }],
        fee: fee_calculator(mid, quantity),
    })
}

Level 2 - Realistic: Crosses the order book with partial fills.

fn match_realistic(&self, side: OrderSide, quantity: f64, limit_price: Option<f64>, fee_calculator: impl Fn(f64, f64) -> f64) -> Result<MatchResult, MatchError> {
    let levels = match side {
        OrderSide::Buy => &orderbook.asks,   // Buy crosses asks
        OrderSide::Sell => &orderbook.bids,  // Sell crosses bids
    };

    let mut remaining = quantity;
    let mut fills = Vec::new();

    for level in levels {
        if remaining <= 0.0 { break; }

        // Respect limit price
        if let Some(limit) = limit_price {
            let crosses = match side {
                OrderSide::Buy => level.price <= limit,
                OrderSide::Sell => level.price >= limit,
            };
            if !crosses { break; }
        }

        let fill_qty = remaining.min(level.size);
        fills.push(FillLeg { price: level.price, quantity: fill_qty });
        remaining -= fill_qty;
    }
    // ... calculate VWAP and fees
}

PositionTracker

Thread-safe position management with RwLock:

pub struct PositionTracker {
    positions: RwLock<HashMap<MarketId, Position>>,
    trades: RwLock<Vec<Trade>>,
    limits: PositionLimits,
}

Each position tracks:

  • Net size: Positive = long, negative = short
  • Entry VWAP: Volume-weighted average price
  • Realized PnL: Closed portion of position
  • Unrealized PnL: Calculated from current market price

Position flip-through handles crossing from long to short (or vice versa):

// Example: Position is 100 long, sell 150
// -> Close 100 (realize PnL), open 50 short
if new_size.signum() != old_size.signum() {
    // Calculate realized PnL for closed portion
    let closed_pnl = old_size.abs() * (exit_price - entry_price) * side_multiplier;
    // New position at current price
    position.entry_price = current_price;
}

Position limits enforce risk controls before execution:

pub struct PositionLimits {
    pub max_position_size: Decimal,      // Per-market
    pub max_open_positions: usize,        // Portfolio-wide
    pub max_notional_exposure: Decimal,   // Total $
}

Historical Storage

SQLite-backed storage for trades and market data:

pub struct TradeStorage {
    conn: Mutex<Connection>,
}

Schema optimizations:

  • Indexes on timestamp and market_id for efficient range queries
  • Decimal as TEXT: Preserves precision without floating point issues
  • RFC3339 timestamps: Human-readable, sortable

Query patterns support backtesting needs:

pub fn query_market_data(
    &self,
    from: DateTime<Utc>,
    to: DateTime<Utc>,
    market_id: Option<&str>,
) -> Result<Vec<MarketDataRecord>, StorageError>

DataReplayer

Deterministic replay with event ordering:

pub struct DataReplayer {
    storage: Arc<TradeStorage>,
    clock: Arc<SimulatedClock>,
    market_data: Vec<MarketDataRecord>,  // Loaded upfront
    current_index: usize,
}

Key behaviors:

  • Upfront loading: All data loaded at construction for determinism
  • Clock advancement: Each next_event() sets clock to event timestamp
  • Seek/pause/reset: Full replay control
pub fn next_event(&mut self) -> Result<ReplayEvent, ReplayError> {
    let event = self.market_data[self.current_index].clone();
    self.clock.set(event.timestamp);  // Advance simulated time
    self.current_index += 1;
    Ok(ReplayEvent::MarketData(event))
}

PerformanceMetrics

Standard financial metrics using rust_decimal for precision:

Sharpe Ratio: Risk-adjusted return

pub fn sharpe_ratio(&self) -> Result<Decimal, MetricsError> {
    let mean_return = self.mean(&returns);
    let std_dev = self.std_dev(&returns)?;
    let excess_return = mean_return - (self.risk_free_rate / self.periods_per_year);
    Ok((excess_return / std_dev) * self.periods_per_year.sqrt())
}

Max Drawdown: Largest peak-to-trough decline

pub fn max_drawdown(&self) -> Result<Decimal, MetricsError> {
    let mut cumulative = dec!(1);
    let mut peak = dec!(1);
    let mut max_dd = dec!(0);

    for trade in &self.trades {
        cumulative = cumulative * (dec!(1) + trade.return_pct);
        peak = peak.max(cumulative);
        let drawdown = (peak - cumulative) / peak;
        max_dd = max_dd.max(drawdown);
    }
    Ok(max_dd)
}

Trade Statistics: Win rate, profit factor, average P&L

Module Structure

arbiter-engine/src/
├── clock/
│   ├── mod.rs           # Exports
│   └── clock.rs         # Clock trait, RealClock, SimulatedClock
├── simulation/
│   ├── mod.rs           # Exports
│   ├── client.rs        # SimulatedExchangeClient
│   ├── config.rs        # SimulationConfig, FidelityLevel
│   └── matching_engine.rs  # Fill simulation
├── position/
│   ├── mod.rs           # Exports
│   └── tracker.rs       # PositionTracker, PnL
├── history/
│   ├── mod.rs           # Exports
│   ├── storage.rs       # SQLite storage
│   └── replayer.rs      # DataReplayer
└── analytics/
    ├── mod.rs           # Exports
    └── metrics.rs       # PerformanceMetrics

Test Coverage

65 new tests covering:

Module Tests
clock 11
simulation/client 11
simulation/config 7
simulation/matching_engine 11
position/tracker 11
history/storage 10
history/replayer 10
analytics/metrics 14

Integration with Kalshi Demo

For the safest testing experience, combine paper trading with Kalshi's demo environment:

cargo run -- --paper-trade --kalshi-demo --fidelity realistic

This provides real market data from Kalshi's demo environment (which mirrors production) while using simulated order execution. See ADR-015: Kalshi Demo Environment for details.

Future Work

  • Level 3 fidelity: Queue position modeling for HFT
  • Parquet export: Large-scale tick data analysis
  • Multi-strategy comparison: A/B testing infrastructure
  • Automated hyperparameter tuning: Grid search over strategy params

References