Skip to content

Market Discovery Phase 1: Foundation Types and Storage

This post covers Phase 1 of ADR-017 (Automated Market Discovery and Matching) - establishing the data types and persistence layer for the discovery system.

The Problem

Manual market mapping is error-prone and doesn't scale. Polymarket and Kalshi list hundreds of markets; finding equivalent pairs requires:

  1. Persistent storage - Track discovered markets across restarts
  2. Status tracking - Pending → Approved/Rejected workflow
  3. Audit trail - Record all approval decisions for compliance
  4. Safety gates - Prevent automated trading without human review

Design Decisions

CandidateStatus State Machine

The core safety mechanism is a one-way state machine:

Pending ──┬──► Approved
          └──► Rejected

Once a candidate is approved or rejected, the status is immutable. This prevents accidental re-processing or status manipulation:

impl CandidateStatus {
    pub fn can_transition_to(&self, new_status: CandidateStatus) -> bool {
        match (self, new_status) {
            (CandidateStatus::Pending, CandidateStatus::Approved) => true,
            (CandidateStatus::Pending, CandidateStatus::Rejected) => true,
            // Once approved or rejected, status is final
            (CandidateStatus::Approved, _) => false,
            (CandidateStatus::Rejected, _) => false,
            _ => false,
        }
    }
}

Semantic Warnings

Markets that appear similar may have different settlement criteria. The CandidateMatch struct includes a semantic_warnings field that Phase 2's matcher will populate:

pub struct CandidateMatch {
    pub semantic_warnings: Vec<String>,  // e.g., "Settlement timing differs"
    // ...
}

Approval will require explicit acknowledgment of these warnings (FR-MD-003).

SQLite Storage

We chose SQLite over PostgreSQL for the discovery cache because:

  1. Single-tenant - Discovery runs locally per operator
  2. Portable - No external dependencies for development
  3. Atomic - Transactions prevent partial state

Schema design separates markets from candidates:

-- Discovered markets (one per platform/id combination)
CREATE TABLE discovered_markets (
    id TEXT PRIMARY KEY,
    platform TEXT NOT NULL,
    platform_id TEXT NOT NULL,
    title TEXT NOT NULL,
    -- ...
    UNIQUE(platform, platform_id)
);

-- Candidate matches (references two markets)
CREATE TABLE candidates (
    id TEXT PRIMARY KEY,
    polymarket_id TEXT NOT NULL,
    kalshi_id TEXT NOT NULL,
    similarity_score REAL NOT NULL,
    status TEXT NOT NULL DEFAULT 'Pending',
    -- ...
);

-- Audit log for compliance
CREATE TABLE audit_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    action TEXT NOT NULL,
    candidate_id TEXT NOT NULL,
    details TEXT NOT NULL  -- Full JSON context
);

Parameterized Queries

All SQL uses the params![] macro to prevent injection:

conn.execute(
    "UPDATE candidates SET status = ?1, updated_at = ?2 WHERE id = ?3",
    params![status_str, now, id.to_string()],
)?;

Test Coverage

Phase 1 includes 12 tests covering:

Module Tests Focus
candidate.rs 5 Type creation, status transitions, serialization
storage.rs 7 CRUD operations, filtering, audit logging

Key safety test:

#[test]
fn test_candidate_status_transitions() {
    // Once approved, cannot transition to any other status
    assert!(!CandidateStatus::Approved.can_transition_to(CandidateStatus::Pending));
    assert!(!CandidateStatus::Approved.can_transition_to(CandidateStatus::Rejected));
}

What's Next

Phase 2 will implement the text matching engine:

  • TextNormalizer - Lowercase, remove punctuation, tokenize
  • SimilarityScorer - Jaccard (0.6 weight) + Levenshtein (0.4 weight)
  • Semantic warning detection for settlement differences

Council Review

Phase 1 passed council verification with confidence 0.88. Key findings:

  • ✅ Human-in-the-loop enforced via CandidateStatus state machine
  • ✅ Audit logging captures all required fields
  • ✅ No SQL injection (all parameterized queries)
  • ✅ No unsafe code

Implementation: arbiter-engine/src/discovery/ | Issues: #41, #42 | ADR: 017