Skip to content

ADR 014: Paper Trading and Backtesting Architecture

Status

Accepted (2026-01-21)

Context

Arbiter-Bot is a statistical arbitrage engine where strategy effectiveness must be validated before deploying real capital. The challenges are:

  1. Risk-free evaluation - Test strategies without financial exposure
  2. Historical validation - Backtest against recorded market data
  3. Performance comparison - Compare simulated vs actual execution
  4. Realistic simulation - Arbitrage opportunities are ephemeral; simplistic simulation leads to overfitting

Current state: The engine has a --dry-run mode that signs orders but doesn't submit them. This validates API integration but doesn't track paper positions, record fills, or calculate performance metrics.

This ADR covers simulation and backtesting infrastructure. Performance monitoring is covered in ADR-012 and low-latency optimizations in ADR-013.

Decision

Implement a multi-fidelity simulation framework with paper trading and backtesting capabilities.

Simulation Fidelity Levels

Level Name Use Case Fill Logic Latency
1 Basic Quick strategy validation Instant fill at mid-price None
2 Realistic Paper trading Order book crossing, partial fills Configurable
3 HFT Latency-sensitive analysis Queue position, market impact Measured

Level 3 is out of scope for initial implementation but the architecture supports future extension.

1. Clock Abstraction

Decouple time from system clock to enable deterministic replay:

use chrono::{DateTime, Duration, Utc};
use std::sync::atomic::{AtomicI64, Ordering};

/// Unified time source for real-time and simulated execution
pub trait Clock: Send + Sync {
    /// Current timestamp
    fn now(&self) -> DateTime<Utc>;

    /// Advance time (no-op for RealClock). Uses chrono::Duration for calendar-aware arithmetic.
    fn advance(&self, duration: Duration);

    /// Whether this is simulated time
    fn is_simulated(&self) -> bool;
}

/// Production clock using system time
pub struct RealClock;

impl Clock for RealClock {
    fn now(&self) -> DateTime<Utc> {
        Utc::now()
    }

    fn advance(&self, _duration: Duration) {
        // No-op for real time
    }

    fn is_simulated(&self) -> bool {
        false
    }
}

/// Simulated clock for backtesting
pub struct SimulatedClock {
    current: AtomicI64,  // Milliseconds since epoch
}

impl SimulatedClock {
    pub fn new(start: DateTime<Utc>) -> Self {
        Self {
            current: AtomicI64::new(start.timestamp_millis()),
        }
    }
}

impl Clock for SimulatedClock {
    fn now(&self) -> DateTime<Utc> {
        let ms = self.current.load(Ordering::SeqCst);
        DateTime::from_timestamp_millis(ms).unwrap()
    }

    fn advance(&self, duration: Duration) {
        self.current.fetch_add(duration.num_milliseconds(), Ordering::SeqCst);
    }

    fn is_simulated(&self) -> bool {
        true
    }
}

Rationale: - Trait abstraction allows seamless switching between modes - AtomicI64 enables lock-free time updates during replay - advance() method supports event-driven time progression - Precision Note: SimulatedClock stores milliseconds since epoch. For sub-millisecond timing scenarios (HFT Level 3), consider using nanosecond precision with AtomicI64 or AtomicU128.

2. SimulatedExchangeClient

Implements the existing ExchangeClient trait with simulated fills:

use crate::market::client::ExchangeClient;

pub struct SimulatedExchangeClient {
    /// Matching engine for fill simulation
    matching_engine: MatchingEngine,
    /// Paper positions
    position_tracker: PositionTracker,
    /// Clock source
    clock: Arc<dyn Clock>,
    /// Configuration
    config: SimulationConfig,
}

#[async_trait]
impl ExchangeClient for SimulatedExchangeClient {
    async fn submit_order(&self, order: Order) -> Result<OrderResponse, ExchangeError> {
        // Simulate network latency if configured
        if let Some(latency) = self.config.simulated_latency {
            tokio::time::sleep(latency).await;
        }

        // Match order against order book
        let fill = self.matching_engine.match_order(&order)?;

        // Update paper positions
        self.position_tracker.record_fill(&fill);

        Ok(OrderResponse {
            order_id: Uuid::new_v4().to_string(),
            status: fill.status,
            filled_qty: fill.filled_quantity,
            avg_price: fill.average_price,
            timestamp: self.clock.now(),
        })
    }

    async fn cancel_order(&self, order_id: &str) -> Result<(), ExchangeError> {
        self.matching_engine.cancel_order(order_id)
    }

    async fn get_positions(&self) -> Result<Vec<Position>, ExchangeError> {
        Ok(self.position_tracker.get_all_positions())
    }
}

Rationale: - Implements same trait as production clients for seamless substitution - Configurable latency injection for realistic simulation - Delegates to MatchingEngine for fill logic

3. Matching Engine

Simulates order matching with configurable fidelity:

pub struct MatchingEngine {
    /// Current order book state (from replay or live feed)
    orderbook: Arc<RwLock<OrderBook>>,
    /// Fidelity level
    fidelity: FidelityLevel,
    /// Pending orders (for partial fill simulation)
    pending_orders: HashMap<String, Order>,
}

#[derive(Clone, Copy)]
pub enum FidelityLevel {
    /// Instant fill at mid-price
    Basic,
    /// Cross order book, partial fills
    Realistic,
}

impl MatchingEngine {
    pub fn match_order(&mut self, order: &Order) -> Result<Fill, MatchError> {
        match self.fidelity {
            FidelityLevel::Basic => self.match_basic(order),
            FidelityLevel::Realistic => self.match_realistic(order),
        }
    }

    /// Level 1: Instant fill at mid-price
    fn match_basic(&self, order: &Order) -> Result<Fill, MatchError> {
        let book = self.orderbook.read();
        let mid_price = book.mid_price().ok_or(MatchError::NoLiquidity)?;

        Ok(Fill {
            order_id: order.id.clone(),
            filled_quantity: order.quantity,
            average_price: mid_price,
            status: OrderStatus::Filled,
            fills: vec![FillLeg {
                price: mid_price,
                quantity: order.quantity,
            }],
        })
    }

    /// Level 2: Cross order book with partial fills
    fn match_realistic(&mut self, order: &Order) -> Result<Fill, MatchError> {
        let mut book = self.orderbook.write();
        let mut remaining = order.quantity;
        let mut fills = Vec::new();
        let mut total_cost = Decimal::ZERO;

        // Walk the book
        let levels = match order.side {
            Side::Buy => book.asks_iter(),
            Side::Sell => book.bids_iter(),
        };

        for (price, available) in levels {
            if remaining.is_zero() {
                break;
            }

            // Check limit price
            if let Some(limit) = order.limit_price {
                let crosses = match order.side {
                    Side::Buy => price <= limit,
                    Side::Sell => price >= limit,
                };
                if !crosses {
                    break;
                }
            }

            let fill_qty = remaining.min(*available);
            fills.push(FillLeg { price: *price, quantity: fill_qty });
            total_cost += price * fill_qty;
            remaining -= fill_qty;
        }

        let filled_qty = order.quantity - remaining;
        if filled_qty.is_zero() {
            return Err(MatchError::NoLiquidity);
        }

        let status = if remaining.is_zero() {
            OrderStatus::Filled
        } else {
            OrderStatus::PartiallyFilled
        };

        Ok(Fill {
            order_id: order.id.clone(),
            filled_quantity: filled_qty,
            average_price: total_cost / filled_qty,
            status,
            fills,
        })
    }
}

Rationale: - Level 1 (Basic) for quick iteration and sanity checks - Level 2 (Realistic) for meaningful paper trading with slippage - Partial fill support prevents overly optimistic backtests

4. Position Tracker

Manages paper positions with PnL calculation:

use std::sync::RwLock;
use chrono::{DateTime, Utc};

pub struct PositionTracker {
    /// Positions by market ID (RwLock for thread-safe concurrent access)
    positions: RwLock<HashMap<MarketId, Position>>,
    /// Trade history for PnL calculation
    trades: RwLock<Vec<Trade>>,
    /// Clock for timestamps
    clock: Arc<dyn Clock>,
}

#[derive(Clone)]
pub struct Position {
    pub market_id: MarketId,
    pub side: Side,
    pub quantity: Decimal,
    pub average_entry_price: Decimal,
    pub realized_pnl: Decimal,
    pub unrealized_pnl: Decimal,
    pub opened_at: DateTime<Utc>,
    pub last_updated: DateTime<Utc>,
}

impl PositionTracker {
    pub fn record_fill(&mut self, fill: &Fill) {
        let position = self.positions
            .entry(fill.market_id.clone())
            .or_insert_with(|| Position::new(fill.market_id.clone()));

        // Update position based on fill
        if fill.side == position.side || position.quantity.is_zero() {
            // Adding to position
            let total_cost = position.average_entry_price * position.quantity
                + fill.average_price * fill.filled_quantity;
            let total_qty = position.quantity + fill.filled_quantity;
            position.average_entry_price = total_cost / total_qty;
            position.quantity = total_qty;
            position.side = fill.side;
        } else {
            // Reducing/closing position
            let close_qty = fill.filled_quantity.min(position.quantity);
            let pnl = match position.side {
                Side::Buy => (fill.average_price - position.average_entry_price) * close_qty,
                Side::Sell => (position.average_entry_price - fill.average_price) * close_qty,
            };
            position.realized_pnl += pnl;
            position.quantity -= close_qty;

            // If we crossed through, handle the remainder as new position
            if fill.filled_quantity > close_qty {
                let remainder = fill.filled_quantity - close_qty;
                position.side = fill.side;
                position.quantity = remainder;
                position.average_entry_price = fill.average_price;
            }
        }

        position.last_updated = self.clock.now();

        // Record trade
        self.trades.push(Trade {
            timestamp: self.clock.now(),
            market_id: fill.market_id.clone(),
            side: fill.side,
            quantity: fill.filled_quantity,
            price: fill.average_price,
            pnl: position.realized_pnl,
        });
    }

    pub fn update_unrealized_pnl(&mut self, market_id: &MarketId, current_price: Decimal) {
        if let Some(position) = self.positions.get_mut(market_id) {
            position.unrealized_pnl = match position.side {
                Side::Buy => (current_price - position.average_entry_price) * position.quantity,
                Side::Sell => (position.average_entry_price - current_price) * position.quantity,
            };
        }
    }
}

Rationale: - Tracks both sides of arbitrage trades - Calculates realized PnL on position close - Updates unrealized PnL from market data

5. Historical Data Storage

SQLite for trade/position persistence, with optional Parquet for tick data:

pub struct TradeStorage {
    conn: Connection,
}

impl TradeStorage {
    pub fn new(path: &Path) -> Result<Self, StorageError> {
        let conn = Connection::open(path)?;

        conn.execute_batch(r#"
            CREATE TABLE IF NOT EXISTS trades (
                id INTEGER PRIMARY KEY,
                timestamp TEXT NOT NULL,
                market_id TEXT NOT NULL,
                side TEXT NOT NULL,
                quantity TEXT NOT NULL,
                price TEXT NOT NULL,
                pnl TEXT NOT NULL
            );

            CREATE TABLE IF NOT EXISTS market_data (
                id INTEGER PRIMARY KEY,
                timestamp TEXT NOT NULL,
                market_id TEXT NOT NULL,
                bid_price TEXT,
                ask_price TEXT,
                bid_size TEXT,
                ask_size TEXT
            );

            CREATE INDEX IF NOT EXISTS idx_trades_timestamp ON trades(timestamp);
            CREATE INDEX IF NOT EXISTS idx_market_data_timestamp ON market_data(timestamp);
        "#)?;

        Ok(Self { conn })
    }

    pub fn record_trade(&self, trade: &Trade) -> Result<(), StorageError> {
        self.conn.execute(
            "INSERT INTO trades (timestamp, market_id, side, quantity, price, pnl) VALUES (?1, ?2, ?3, ?4, ?5, ?6)",
            params![
                trade.timestamp.to_rfc3339(),
                trade.market_id.as_str(),
                trade.side.to_string(),
                trade.quantity.to_string(),
                trade.price.to_string(),
                trade.pnl.to_string(),
            ],
        )?;
        Ok(())
    }

    pub fn query_trades(&self, from: DateTime<Utc>, to: DateTime<Utc>) -> Result<Vec<Trade>, StorageError> {
        let mut stmt = self.conn.prepare(
            "SELECT timestamp, market_id, side, quantity, price, pnl FROM trades WHERE timestamp >= ?1 AND timestamp <= ?2 ORDER BY timestamp"
        )?;

        let trades = stmt.query_map(params![from.to_rfc3339(), to.to_rfc3339()], |row| {
            Ok(Trade {
                timestamp: DateTime::parse_from_rfc3339(&row.get::<_, String>(0)?)
                    .unwrap()
                    .with_timezone(&Utc),
                market_id: MarketId::new(row.get(1)?),
                side: Side::from_str(&row.get::<_, String>(2)?).unwrap(),
                quantity: Decimal::from_str(&row.get::<_, String>(3)?).unwrap(),
                price: Decimal::from_str(&row.get::<_, String>(4)?).unwrap(),
                pnl: Decimal::from_str(&row.get::<_, String>(5)?).unwrap(),
            })
        })?;

        trades.collect::<Result<Vec<_>, _>>().map_err(Into::into)
    }
}

Rationale: - SQLite for simplicity and portability (no external database) - RFC3339 timestamps for human readability and sorting - Decimal stored as text to preserve precision - Indexed for time-range queries

6. Data Replayer

Replay historical market data for backtesting:

pub struct DataReplayer {
    storage: Arc<TradeStorage>,
    clock: Arc<SimulatedClock>,
    speed: f64,  // 1.0 = real-time, 10.0 = 10x faster
}

impl DataReplayer {
    pub async fn replay(
        &self,
        from: DateTime<Utc>,
        to: DateTime<Utc>,
        handler: impl Fn(MarketData) + Send,
    ) -> Result<ReplayStats, ReplayError> {
        let events = self.storage.query_market_data(from, to)?;
        let mut last_timestamp = from;
        let mut event_count = 0;

        for event in events {
            // Advance simulated clock
            let elapsed = event.timestamp - last_timestamp;
            self.clock.advance(elapsed);

            // Apply speed factor for real-time pacing (optional)
            if self.speed > 0.0 && self.speed != f64::INFINITY {
                let sleep_duration = elapsed.to_std().unwrap_or_default();
                let adjusted = sleep_duration.div_f64(self.speed);
                tokio::time::sleep(adjusted).await;
            }

            handler(event.clone());
            last_timestamp = event.timestamp;
            event_count += 1;
        }

        Ok(ReplayStats {
            events_replayed: event_count,
            time_range: to - from,
            actual_duration: Instant::now().elapsed(),
        })
    }
}

Rationale: - Configurable replay speed (instant for backtests, slower for visualization) - Clock advances event-by-event for determinism - Handler pattern allows flexible processing

7. Performance Metrics

Calculate strategy performance metrics:

pub struct PerformanceMetrics {
    trades: Vec<Trade>,
    equity_curve: Vec<(DateTime<Utc>, Decimal)>,
    risk_free_rate: Decimal,  // Annualized
}

impl PerformanceMetrics {
    pub fn from_trades(trades: Vec<Trade>, initial_capital: Decimal) -> Self {
        let mut equity = initial_capital;
        let mut equity_curve = vec![(trades.first().map(|t| t.timestamp).unwrap_or_else(Utc::now), equity)];

        for trade in &trades {
            equity += trade.pnl;
            equity_curve.push((trade.timestamp, equity));
        }

        Self {
            trades,
            equity_curve,
            risk_free_rate: Decimal::new(5, 2),  // 5% default
        }
    }

    /// Sharpe ratio (annualized)
    pub fn sharpe_ratio(&self) -> Option<Decimal> {
        let returns = self.daily_returns();
        if returns.len() < 2 {
            return None;
        }

        let mean = returns.iter().sum::<Decimal>() / Decimal::from(returns.len());
        let variance = returns.iter()
            .map(|r| (r - mean).powi(2))
            .sum::<Decimal>() / Decimal::from(returns.len() - 1);
        let std_dev = variance.sqrt()?;

        if std_dev.is_zero() {
            return None;
        }

        let daily_rf = self.risk_free_rate / Decimal::from(252);
        let excess_return = mean - daily_rf;
        let sharpe = (excess_return / std_dev) * Decimal::from(252).sqrt()?;

        Some(sharpe)
    }

    /// Maximum drawdown (percentage)
    pub fn max_drawdown(&self) -> Decimal {
        let mut peak = Decimal::ZERO;
        let mut max_dd = Decimal::ZERO;

        for (_, equity) in &self.equity_curve {
            if *equity > peak {
                peak = *equity;
            }
            let dd = (peak - equity) / peak;
            if dd > max_dd {
                max_dd = dd;
            }
        }

        max_dd * Decimal::from(100)  // Return as percentage
    }

    /// Win rate (percentage of profitable trades)
    pub fn win_rate(&self) -> Decimal {
        if self.trades.is_empty() {
            return Decimal::ZERO;
        }

        let wins = self.trades.iter().filter(|t| t.pnl > Decimal::ZERO).count();
        Decimal::from(wins) / Decimal::from(self.trades.len()) * Decimal::from(100)
    }

    /// Profit factor (gross profit / gross loss)
    pub fn profit_factor(&self) -> Option<Decimal> {
        let gross_profit: Decimal = self.trades.iter()
            .filter(|t| t.pnl > Decimal::ZERO)
            .map(|t| t.pnl)
            .sum();
        let gross_loss: Decimal = self.trades.iter()
            .filter(|t| t.pnl < Decimal::ZERO)
            .map(|t| t.pnl.abs())
            .sum();

        if gross_loss.is_zero() {
            return None;
        }

        Some(gross_profit / gross_loss)
    }

    /// Total PnL
    pub fn total_pnl(&self) -> Decimal {
        self.trades.iter().map(|t| t.pnl).sum()
    }

    /// Trade count
    pub fn trade_count(&self) -> usize {
        self.trades.len()
    }

    fn daily_returns(&self) -> Vec<Decimal> {
        // Group equity by day and calculate daily returns
        let mut daily: HashMap<NaiveDate, Decimal> = HashMap::new();
        for (ts, equity) in &self.equity_curve {
            daily.insert(ts.date_naive(), *equity);
        }

        let mut dates: Vec<_> = daily.keys().cloned().collect();
        dates.sort();

        dates.windows(2)
            .filter_map(|w| {
                let prev = daily.get(&w[0])?;
                let curr = daily.get(&w[1])?;
                if prev.is_zero() {
                    None
                } else {
                    Some((curr - prev) / prev)
                }
            })
            .collect()
    }
}

Rationale: - Standard financial metrics for strategy evaluation - Annualized Sharpe ratio for comparability - Max drawdown for risk assessment - Profit factor for edge quantification

Architecture Diagram

┌────────────────────────────────────────────────────────────────────────────┐
│                              Mode Selection                                 │
│  ┌────────────┐    ┌────────────┐    ┌────────────┐                        │
│  │   Live     │    │   Paper    │    │  Backtest  │                        │
│  │  Trading   │    │  Trading   │    │            │                        │
│  └─────┬──────┘    └─────┬──────┘    └─────┬──────┘                        │
└────────┼─────────────────┼─────────────────┼───────────────────────────────┘
         │                 │                 │
         ▼                 ▼                 ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ ExchangeClient  │ │ Simulated       │ │ Simulated       │
│ (Polymarket/    │ │ ExchangeClient  │ │ ExchangeClient  │
│  Kalshi)        │ │ + RealClock     │ │ + SimulatedClock│
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
         │                   │                   │
         ▼                   ▼                   ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                           ExecutionActor                                    │
│  ┌──────────────────────────────────────────────────────────┐             │
│  │                    Saga State Machine                     │             │
│  │  Pending → Leg1Init → Leg1Filled → Leg2Init → Completed  │             │
│  └──────────────────────────────────────────────────────────┘             │
└────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────┐
│                           PositionTracker                                   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                        │
│  │  Positions  │  │   Trades    │  │    PnL      │                        │
│  └─────────────┘  └─────────────┘  └─────────────┘                        │
└────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────┐
│                           TradeStorage (SQLite)                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                        │
│  │   trades    │  │ market_data │  │  positions  │                        │
│  └─────────────┘  └─────────────┘  └─────────────┘                        │
└────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────┐
│                         PerformanceMetrics                                  │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐                  │
│  │  Sharpe   │ │ Max DD    │ │ Win Rate  │ │  Profit   │                  │
│  │  Ratio    │ │           │ │           │ │  Factor   │                  │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘                  │
└────────────────────────────────────────────────────────────────────────────┘

Implementation Phases

Phase Scope Deliverables Dependencies
1 Core Simulation Clock trait, SimulatedExchangeClient, MatchingEngine None
2 Position Tracking PositionTracker, PnL calculation Phase 1
3 Historical Data TradeStorage, DataReplayer, DataRecorder Phase 2
4 Analytics PerformanceMetrics, Report generation Phase 3

Consequences

Positive

  • Risk-free strategy evaluation - Test before deploying capital
  • Historical validation - Backtest against real market conditions
  • Performance tracking - Quantify strategy edge with standard metrics
  • Seamless integration - SimulatedExchangeClient implements existing trait
  • Deterministic replay - Same data produces same results

Negative

  • Simulation bias - Even Level 2 can't capture all real-world effects
  • Storage overhead - Historical data requires disk space
  • Complexity - Multiple code paths for live/paper/backtest modes
  • Overfitting risk - Backtesting can lead to curve-fitted strategies

Risks

Risk Mitigation
Overfit to historical data Use walk-forward analysis; validate on out-of-sample data
Simulation diverges from reality Compare paper vs live performance; alert on divergence
Storage grows unbounded Implement data retention policies; archive old data
Clock drift in paper trading Use monotonic clock; log clock skew warnings

Alternatives Considered

Alternative 1: External Backtesting Framework (Barter-rs)

  • Pro: Mature, feature-rich, well-tested
  • Con: Different architecture; requires adapting our actor model
  • Decision: Build internal solution that integrates with existing ExchangeClient trait

Alternative 2: Record-Replay Only (No Simulation)

  • Pro: Simpler; replays actual API responses
  • Con: Can't simulate new strategies; limited to historical execution
  • Decision: Support both simulation (for what-if) and replay (for debugging)

Alternative 3: Cloud-Based Backtesting Service

  • Pro: Scalable; offloads computation
  • Con: Latency; data privacy; vendor lock-in
  • Decision: Local execution for speed and privacy

Alternative 4: Always Level 3 Fidelity

  • Pro: Most accurate simulation
  • Con: Complex; requires queue position tracking; diminishing returns for arb strategies
  • Decision: Start with Level 1-2; architect for Level 3 extension

References