Skip to content

Automated Market Discovery and Matching

2026-01-22 | ADR-017 Implementation

Implementation of automated market discovery between Polymarket and Kalshi while preserving human-in-the-loop safety.

The Problem

Manual market discovery doesn't scale:

  1. Discovery burden: Operators research markets on both platforms independently
  2. Missed opportunities: New markets go undetected
  3. No persistence: Mappings exist only in memory
  4. Scale limitation: Cannot monitor thousands of markets

Industry context: Research documented $40M+ in arbitrage profits from Polymarket alone (Apr 2024 - Apr 2025). Existing bots watch 10,000+ markets.

The Solution

Text similarity matching with semantic warnings and mandatory human approval.

Architecture

API Clients → Scanner (hourly) → Matcher → Candidates (SQLite) → Human Review → MappingManager

Matching Algorithm

  1. Pre-filter: Category, expiration (±7 days), outcome count
  2. Similarity: 0.6 × Jaccard(tokens) + 0.4 × Levenshtein_normalized
  3. Threshold: Score ≥ 0.6 creates candidate for review
  4. Warnings: Flag settlement differences (announcement vs actual event)

Safety Architecture (FR-MD-003)

The critical constraint: settlement semantics differ across platforms.

Example - 2024 Government Shutdown: - Polymarket: "OPM issues shutdown announcement" - Kalshi: "Actual shutdown exceeding 24 hours"

Same event, different resolution criteria, potentially different outcomes.

Safety Gates

pub fn approve(&self, id: Uuid, acknowledge_warnings: bool) -> Result<(), ApprovalError> {
    let candidate = self.storage.get_candidate(id)?;

    // Safety: Require warning acknowledgment
    if !candidate.semantic_warnings.is_empty() && !acknowledge_warnings {
        return Err(ApprovalError::WarningsNotAcknowledged);
    }

    // Use existing safety gate (FR-MD-003)
    let mut manager = self.mapping_manager.lock().unwrap();
    let mapping_id = manager.propose_mapping(/*...*/);
    manager.verify_mapping(mapping_id);

    // Audit log for compliance
    self.storage.log_decision(/*...*/)?;
    Ok(())
}

What This Guarantees

  1. Human-in-the-loop: Candidates require explicit approval
  2. FR-MD-003 enforced: Uses existing MappingManager.verify_mapping()
  3. Semantic warnings block quick approval: Must acknowledge settlement differences
  4. Audit trail: All approvals/rejections logged

Implementation Highlights

Feature Flag

Discovery is opt-in via Cargo feature:

[features]
discovery = ["dep:strsim"]

CLI Interface

# Discover and match markets
cargo run --features discovery -- --discover-markets

# Review candidates interactively
cargo run --features discovery -- --review-candidates

# List pending candidates
cargo run --features discovery -- --list-candidates --status pending

# Batch operations
cargo run --features discovery -- --approve-candidates --ids "uuid1,uuid2"
cargo run --features discovery -- --reject-candidates --ids "uuid1" --reason "Settlement differs"

Similarity Scorer

pub struct SimilarityScorer {
    jaccard_weight: f64,      // 0.6
    levenshtein_weight: f64,  // 0.4
    threshold: f64,           // 0.6
}

impl SimilarityScorer {
    pub fn find_matches(&self, market: &DiscoveredMarket, candidates: &[DiscoveredMarket])
        -> Vec<CandidateMatch>
    {
        candidates.iter()
            .filter(|c| self.pre_filter(market, c))
            .filter_map(|c| {
                let score = self.combined_score(&market.title, &c.title);
                if score >= self.threshold {
                    Some(CandidateMatch::new(market.clone(), c.clone(), score))
                } else {
                    None
                }
            })
            .collect()
    }
}

Scanner Actor

impl Actor for DiscoveryScannerActor {
    type Message = ScannerMsg;

    async fn handle(&mut self, message: Self::Message) -> Result<(), ActorError> {
        match message {
            ScannerMsg::Scan => {
                let poly_markets = self.fetch_all_markets(&*self.polymarket_client).await?;
                let kalshi_markets = self.fetch_all_markets(&*self.kalshi_client).await?;

                // Store markets
                for market in &poly_markets {
                    self.storage.lock().await.upsert_market(market)?;
                }

                // Find candidates
                for poly_market in &poly_markets {
                    let matches = self.scorer.find_matches(poly_market, &kalshi_markets);
                    for candidate in matches {
                        if !self.is_duplicate_candidate(&candidate).await? {
                            self.storage.lock().await.insert_candidate(&candidate)?;
                        }
                    }
                }
            }
            // ...
        }
    }
}

Test Coverage

48 tests across 5 phases:

Module Tests
candidate.rs 5
storage.rs 7
normalizer.rs 3
matcher.rs 7
polymarket_gamma.rs 4
kalshi_markets.rs 4
scanner.rs 5
approval.rs 5
CLI integration 8

Council Review

All 5 phases passed LLM Council review with confidence >= 0.87.

Final ADR Review: - Verdict: PASS - Confidence: 0.88 - Weighted Score: 8.55/10

Safety gates (FR-MD-003) received "PASS (Strong)" verdict.

Why Text Similarity Over LLM/Embeddings

Options considered:

Approach Accuracy Cost Latency
Text similarity Moderate Zero Sub-ms
LLM verification High $0.01-0.05/call +200-500ms
Embeddings Highest Storage + compute Batch dependent

Text similarity was selected because:

  1. Sufficient for MVP: Catches majority of matches
  2. Zero dependencies: No external API costs
  3. Extensible: LLM verification can be added later
  4. Council compliant: "Suggestion engine only" per Design Review 1

Update: Post-Implementation Learnings (2026-01-23)

Post-implementation testing revealed a critical gap: text similarity is insufficient for production.

The Problem

Real market pairs score only 8-9% similarity despite semantic equivalence:

Kalshi Polymarket Jaccard
"Will Trump buy Greenland?" "Will the US acquire part of Greenland in 2026?" 8.3%
"Will Washington win the 2026 Pro Football Championship?" "Super Bowl Champion 2026" 9.1%

Root causes: - Different vocabulary: "Super Bowl" vs "Pro Football Championship" - Different framing: Question vs statement - Different specificity: Team name vs championship event

The Solution: 5-Phase Approach

We've extended ADR-017 with a progressive enhancement roadmap:

Phase 1: Text Similarity     ← Current (MVP, 8-9% accuracy on hard pairs)
Phase 2: Fingerprint Matching ← Proposed (entity extraction, field-weighted scoring)
Phase 3: Embedding Matching   ← Proposed (semantic similarity via vectors)
Phase 4: LLM Verification     ← Proposed (human-level reasoning for uncertain cases)
Phase 5: Human Feedback Loop  ← Proposed (continuous improvement from decisions)

Phase 3: Embedding-Based Semantic Matching

Embeddings capture semantic similarity that text matching misses:

# "Super Bowl" and "Pro Football Championship" have zero word overlap
# but high embedding similarity
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

emb1 = model.encode("Super Bowl Champion 2026")
emb2 = model.encode("2026 Pro Football Championship winner")
similarity = cosine_similarity(emb1, emb2)  # ~0.85

New requirements: FR-MD-018 through FR-MD-023

Phase 4: LLM Verification

For uncertain matches (0.60-0.85 score), invoke LLM for human-level reasoning:

Candidate pair for verification:

Market A (Kalshi): "Will the US acquire part of Greenland in 2026?"
Market B (Polymarket): "Will Trump buy Greenland?"

Analyze: Are these the same underlying event?
Consider: Resolution criteria, timing, specificity

Cost optimization: Haiku screening ($0.001/call), Sonnet escalation ($0.01/call) Budget: ~$50/day for 5,000 candidates

New requirements: FR-MD-024 through FR-MD-027

Phase 5: Learning from Human Feedback (Data Flywheel)

The key innovation: human approval decisions are training data.

┌─────────────────────────────────────────────────────────────┐
│           Data Flywheel: Human Decisions Train Models        │
├─────────────────────────────────────────────────────────────┤
│  Human Approval ──► Entity Alias Learning                   │
│                     ("Super Bowl" = "Pro Football Championship")
│                                                             │
│  Human Approval ──► Embedding Fine-Tuning                   │
│                     (contrastive learning on approved pairs) │
│                                                             │
│  Human Approval ──► Weight Optimization                     │
│                     (logistic regression on decision history)│
└─────────────────────────────────────────────────────────────┘

Weekly improvement cycle: - Monday: Export new decisions, update golden set - Tuesday: Retrain embedding model, optimize weights - Wednesday: Validate on golden set - Thursday-Saturday: A/B test (10% traffic) - Sunday: Promote if improved, rollback if degraded

New requirements: FR-MD-028 through FR-MD-032

Council Review

The Phase 3-5 extension passed council review:

Dimension Score
Accuracy 8.5
Completeness 9.0
Clarity 8.5
Conciseness 7.5
Relevance 9.0

Verdict: PASS (confidence 0.87, weighted score 8.5)

What This Means

The safety architecture remains unchanged: human-in-the-loop is mandatory (FR-MD-003). But now each human decision improves future matching, creating a virtuous cycle where accuracy improves over time with minimal additional effort.

References