Skip to content

Market Discovery

Automated discovery and matching of market pairs between Polymarket and Kalshi.

Overview

Market discovery automates the process of finding equivalent markets across both platforms using a five-phase matching pipeline:

Phase Technique Purpose
1 Text Similarity Fast initial filtering using Jaccard + Levenshtein
2 Fingerprint Matching Structured field comparison (entity, date, threshold)
3 Embedding Matching Semantic similarity via vector embeddings
4 LLM Verification AI reasoning for uncertain cases
5 Feedback Learning Continuous improvement from human decisions

The system:

  1. Scans both platforms for active markets
  2. Generates structured fingerprints for each market
  3. Uses hybrid scoring (fingerprint + embedding + text) to find matches
  4. Escalates uncertain cases to LLM verification
  5. Presents candidates for human review and approval
  6. Learns from decisions to improve future matching

Human Approval Required

All discovered market pairs require human confirmation before use in trading. This is a safety-critical requirement (FR-MD-003) to prevent automated mapping errors like confusing "Trump" with "Trump Jr."

Prerequisites

Build with Discovery Feature

Market discovery is an optional feature. Build with the discovery feature flag:

cargo build --manifest-path arbiter-engine/Cargo.toml --features discovery

No Credentials Required

Discovery commands don't require exchange credentials. They use public market data APIs:

  • Polymarket: Gamma API (public)
  • Kalshi: /v2/markets endpoint (public)

Quick Start

1. Run a Discovery Scan

Scan both platforms for matching markets:

cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
  --discover-markets

This fetches markets from both platforms, runs similarity matching, and stores candidates in the discovery database (discovery.db by default).

2. Review Pending Candidates

List candidates awaiting review:

cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
  --list-candidates --status pending

Example output:

Found 3 candidate(s):

1. ID: 550e8400-e29b-41d4-a716-446655440000
   Status: Pending
   Polymarket: Will BTC reach $100k by 2026? (poly-btc-100k)
   Kalshi: Bitcoin reaches $100,000 by 2026? (KXBTC-100K-2026)
   Similarity: 85.2%

2. ID: 6ba7b810-9dad-11d1-80b4-00c04fd430c8
   Status: Pending
   Polymarket: Will there be a government shutdown in 2026? (poly-shutdown-2026)
   Kalshi: Government shutdown exceeding 24 hours in 2026? (KXGOV-SHUTDOWN-2026)
   Similarity: 72.1%
   Warnings:
      - Settlement criteria may differ: "shutdown" vs "shutdown exceeding 24 hours"

3. Approve or Reject Candidates

Approve (No Warnings)

cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
  --approve-candidate 550e8400-e29b-41d4-a716-446655440000

Approve with Warning Acknowledgment

If a candidate has semantic warnings, you must explicitly acknowledge them:

cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
  --approve-candidate 6ba7b810-9dad-11d1-80b4-00c04fd430c8 \
  --acknowledge-warnings

Reject with Reason

Rejections require a documented reason for the audit trail:

cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
  --reject-candidate 6ba7b810-9dad-11d1-80b4-00c04fd430c8 \
  --reason "Different settlement criteria - Polymarket uses announcement, Kalshi uses actual duration"

Workflow

graph TB
    A[Run Discovery Scan] --> B[Candidates Generated]
    B --> C{Review Candidate}
    C -->|No Warnings| D[Approve]
    C -->|Has Warnings| E{Acknowledge?}
    E -->|Yes| F[Approve with Acknowledgment]
    E -->|No| G[Reject with Reason]
    C -->|Invalid Match| G
    D --> H[Verified Mapping Created]
    F --> H
    G --> I[Logged to Audit Trail]
    H --> J[Available for Trading]

Understanding Similarity Scores

Hybrid Scoring Algorithm (Phases 2-3)

The matching algorithm uses a hybrid approach combining multiple signals:

Component Weight Description
Fingerprint Score 50% Structured field matching (entity, date, threshold, outcome)
Embedding Similarity 40% Semantic vector similarity
Text Similarity 10% Jaccard + Levenshtein fallback
final_score = 0.50 × fingerprint + 0.40 × embedding + 0.10 × text

Fingerprint Field Weights (Phase 2)

Field Weight Comparison Method
Entity 30% Exact or alias match (e.g., "BTC" → "Bitcoin")
Date 25% Year/quarter/month overlap with ±7 day tolerance
Threshold 20% Numeric comparison with 5% tolerance
Outcome 15% Binary vs multi-choice structure
Source 10% Resolution source match

Decision Thresholds

Score Range Decision Action
≥ 0.85 Auto-Approve High confidence, minimal review
0.70-0.85 Escalate LLM verification or human review
0.60-0.70 Review Requires careful human inspection
< 0.60 Auto-Reject Below matching threshold

Score Examples

Score Typical Matches
90-100% Identical entities, dates, and thresholds
75-90% Same event, minor differences (e.g., date tolerance)
60-75% Similar events, semantic warnings likely
<60% Not matched (different events)

Semantic Warnings

The system detects potential settlement differences:

Warning Type Example
Conditional Language "if X happens" vs "X will happen"
Time Thresholds "shutdown" vs "shutdown exceeding 24 hours"
Resolution Source "OPM announcement" vs "actual event"
Outcome Definitions "reaches $100k" vs "closes above $100k"

Always Review Warnings

Semantic warnings indicate potential differences in how markets resolve. Approving mismatched markets can result in:

  • One leg winning while the other loses
  • Significant financial loss
  • Unhedgeable positions

Phase 2: Fingerprint Matching

Fingerprint matching extracts structured fields from market titles and descriptions for precise comparison.

Market Fingerprint Structure

MarketFingerprint {
    entity: "Trump"              # Primary entity (person, crypto, event)
    event_type: Acquisition      # Type: PriceTarget, Election, Acquisition, etc.
    metric: { threshold: 100000 } # Numeric threshold if applicable
    resolution_window: 2026-01   # Resolution date/period
    outcome_type: Binary         # Binary or MultiOutcome
}

Entity Extraction

The system automatically extracts entities from market titles:

Entity Type Examples Pattern
Person Trump, Biden, Harris Named individuals
Crypto Bitcoin, BTC, ETH Cryptocurrencies with alias resolution
PriceTarget $100k, $50,000 Numeric values with currency
Date Q2 2026, June 2026 Temporal references
Event Super Bowl, Fed Meeting Known event types

Alias Resolution

Common aliases are automatically resolved:

Alias Canonical Name
BTC Bitcoin
ETH Ethereum
45 Donald Trump
Fed Federal Reserve
Pro Football Championship Super Bowl

View Fingerprints (Debug)

# Show fingerprint for a specific market
cargo run --features discovery -- --show-fingerprint --ticker "KXBTC-100K-2026"

# Output:
# Fingerprint for KXBTC-100K-2026:
#   Entity: Bitcoin
#   Event Type: PriceTarget
#   Threshold: $100,000 (direction: Above)
#   Resolution: 2026-12-31
#   Outcome: Binary

Phase 3: Embedding-Based Matching

Embeddings capture semantic similarity that fingerprints may miss. "Super Bowl" and "Pro Football Championship" have zero word overlap but high embedding similarity.

Embedding Pipeline

graph LR
    A[Market Title] --> B[Embedder]
    B --> C[Vector 256-dim]
    C --> D[VectorStore]
    D --> E[Nearest Neighbors]
    E --> F[Similar Markets]

Cosine Similarity

Embeddings are compared using cosine similarity:

similarity = dot(embedding_A, embedding_B) / (||A|| × ||B||)

Values range from -1 (opposite) to 1 (identical).

Configuration

Variable Description Default
DISCOVERY_EMBEDDING_DIM Embedding dimension 256
DISCOVERY_EMBEDDING_BATCH_SIZE Batch size for generation 100

Phase 4: LLM Verification

For uncertain cases (score 0.60-0.85), the system can invoke LLM verification for human-level reasoning.

Escalation Rules

Cases are escalated to LLM when:

Trigger Condition
Uncertain Score Fingerprint score between 0.60-0.85
Warnings Present Semantic warnings detected
High Value Market volume > $10,000
Conflicting Signals High entity match but low date match

Escalation Tiers

Tier Model Cost Use Case
None - $0 Score ≥ 0.85, no warnings
Haiku Claude Haiku ~$0.001 Initial screening
Sonnet Claude Sonnet ~$0.01 Complex resolution analysis
Human Manual - LLM uncertain or conflicts detected

Cost Management

LLM verification has a configurable daily budget:

# Set daily budget (default: $50/day)
export DISCOVERY_LLM_BUDGET=50.00

LLM Response Format

{
  "equivalent": true,
  "confidence": 0.92,
  "reasoning": "Both markets resolve based on BTC/USD spot price reaching $100,000",
  "warnings": [],
  "resolution_differences": []
}

Phase 5: Learning from Human Feedback

Every approval/rejection decision improves future matching accuracy.

Decision Logging

All decisions are logged with full context:

{
  "candidate_id": "550e8400-e29b-41d4-a716-446655440000",
  "decision": "approved",
  "fingerprint_score": 0.82,
  "embedding_score": 0.88,
  "llm_confidence": null,
  "escalation_level": "None",
  "category": "crypto",
  "entity_corrections": null
}

Alias Learning

When you approve a match with entity differences, the system learns new aliases:

# If you approve a match where:
#   Kalshi: "45 wins election"
#   Polymarket: "Trump wins election"
#
# The system learns: "45" → "Donald Trump"

Aliases are stored with confidence scores that increase with each confirmation.

Weight Optimization

The system periodically optimizes fingerprint field weights based on approval patterns:

# Initial weights:
entity: 0.30, date: 0.25, threshold: 0.20, outcome: 0.15, source: 0.10

# After 100 decisions, optimized weights might become:
entity: 0.35, date: 0.28, threshold: 0.18, outcome: 0.12, source: 0.07

Training Data Export

Export decisions for model training:

# Export to JSONL format for training
cargo run --features discovery -- --export-training-data --output training.jsonl

Configuration

Environment Variables

Core Settings

Variable Description Default
DISCOVERY_SCAN_INTERVAL_SECS Auto-scan interval 3600
DISCOVERY_SIMILARITY_THRESHOLD Minimum match score 0.6
DISCOVERY_DB_PATH Database file path discovery.db

Phase 2-3: Scoring

Variable Description Default
DISCOVERY_AUTO_APPROVE_THRESHOLD Score for auto-approval 0.85
DISCOVERY_AUTO_REJECT_THRESHOLD Score for auto-rejection 0.40
DISCOVERY_FINGERPRINT_WEIGHT Weight for fingerprint score 0.50
DISCOVERY_EMBEDDING_WEIGHT Weight for embedding score 0.40
DISCOVERY_TEXT_WEIGHT Weight for text similarity 0.10

Phase 4: LLM Verification

Variable Description Default
DISCOVERY_LLM_ENABLED Enable LLM verification false
DISCOVERY_LLM_BUDGET Daily budget in USD 50.00
DISCOVERY_ESCALATION_LOW Lower escalation threshold 0.60
DISCOVERY_ESCALATION_HIGH Upper escalation threshold 0.85

CLI Options

Discovery Commands

Flag Description
--discover-markets Run a discovery scan
--list-candidates List match candidates
--discovery-db <path> Custom database path
--status <filter> Filter: pending, approved, rejected, all

Approval Commands

Flag Description
--approve-candidate <uuid> Approve a candidate
--reject-candidate <uuid> Reject a candidate
--acknowledge-warnings Required to approve candidates with warnings
--reason <text> Required when rejecting

Debug Commands (Phase 2-5)

Flag Description
--show-fingerprint Display fingerprint for a market
--test-match --kalshi <ticker> --poly <id> Test matching between two markets
--evaluate-matching Run evaluation on golden set
--export-training-data Export decisions for training

Database

Discovery data is stored in SQLite:

discovery.db
├── discovered_markets    # Cached market data from both platforms
├── candidates            # Match candidates with status
├── match_decisions       # Decision logging with scores (Phase 5)
├── learned_aliases       # Entity aliases from corrections (Phase 5)
├── embeddings            # Market embeddings (Phase 3)
└── audit_log            # All approval/rejection decisions

Decision Logging Schema (Phase 5)

CREATE TABLE match_decisions (
    id TEXT PRIMARY KEY,
    candidate_id TEXT NOT NULL,
    decision TEXT NOT NULL,          -- 'approved', 'rejected'
    fingerprint_score REAL,
    embedding_score REAL,
    llm_confidence REAL,
    escalation_level TEXT,           -- 'None', 'Haiku', 'Sonnet', 'Human'
    category TEXT,
    rejection_reason TEXT,
    entity_corrections TEXT,         -- JSON: {"old": "BTC", "new": "Bitcoin"}
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

Custom Database Path

cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
  --list-candidates --discovery-db /path/to/custom.db

Audit Trail

All decisions are logged for compliance:

{
  "timestamp": "2026-01-22T15:30:00Z",
  "action": "approve",
  "candidate_id": "550e8400-e29b-41d4-a716-446655440000",
  "polymarket_id": "poly-btc-100k",
  "kalshi_id": "KXBTC-100K-2026",
  "similarity_score": 0.852,
  "semantic_warnings": [],
  "acknowledged_warnings": false,
  "session_id": "abc123"
}

Best Practices

Regular Discovery

Run discovery scans regularly to find new market opportunities:

# Cron job example: scan every hour
0 * * * * cd /path/to/arbiter-bot && cargo run --features discovery -- --discover-markets

Review Before Major Events

Before high-impact events (elections, economic announcements), review pending candidates to ensure mappings are accurate.

Document Rejections

Always provide clear rejection reasons. This helps:

  • Future reviewers understand why pairs were rejected
  • Improve matching algorithm over time
  • Maintain compliance audit trail

Use Demo Environment First

Test the discovery workflow with --kalshi-demo to verify your review process before using production mappings.

Troubleshooting

"Discovery commands require the 'discovery' feature"

Rebuild with the feature flag:

cargo build --manifest-path arbiter-engine/Cargo.toml --features discovery

"Cannot approve: candidate has semantic warnings"

You must explicitly acknowledge warnings:

cargo run --features discovery -- \
  --approve-candidate <uuid> --acknowledge-warnings

"Rejection requires a reason"

Provide a reason with --reason:

cargo run --features discovery -- \
  --reject-candidate <uuid> --reason "Your reason here"

"Candidate not found"

Verify the UUID is correct:

cargo run --features discovery -- --list-candidates --status all