Market Discovery¶
Automated discovery and matching of market pairs between Polymarket and Kalshi.
Overview¶
Market discovery automates the process of finding equivalent markets across both platforms using a five-phase matching pipeline:
| Phase | Technique | Purpose |
|---|---|---|
| 1 | Text Similarity | Fast initial filtering using Jaccard + Levenshtein |
| 2 | Fingerprint Matching | Structured field comparison (entity, date, threshold) |
| 3 | Embedding Matching | Semantic similarity via vector embeddings |
| 4 | LLM Verification | AI reasoning for uncertain cases |
| 5 | Feedback Learning | Continuous improvement from human decisions |
The system:
- Scans both platforms for active markets
- Generates structured fingerprints for each market
- Uses hybrid scoring (fingerprint + embedding + text) to find matches
- Escalates uncertain cases to LLM verification
- Presents candidates for human review and approval
- Learns from decisions to improve future matching
Human Approval Required
All discovered market pairs require human confirmation before use in trading. This is a safety-critical requirement (FR-MD-003) to prevent automated mapping errors like confusing "Trump" with "Trump Jr."
Prerequisites¶
Build with Discovery Feature¶
Market discovery is an optional feature. Build with the discovery feature flag:
No Credentials Required¶
Discovery commands don't require exchange credentials. They use public market data APIs:
- Polymarket: Gamma API (public)
- Kalshi:
/v2/marketsendpoint (public)
Quick Start¶
1. Run a Discovery Scan¶
Scan both platforms for matching markets:
This fetches markets from both platforms, runs similarity matching, and stores candidates in the discovery database (discovery.db by default).
2. Review Pending Candidates¶
List candidates awaiting review:
cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
--list-candidates --status pending
Example output:
Found 3 candidate(s):
1. ID: 550e8400-e29b-41d4-a716-446655440000
Status: Pending
Polymarket: Will BTC reach $100k by 2026? (poly-btc-100k)
Kalshi: Bitcoin reaches $100,000 by 2026? (KXBTC-100K-2026)
Similarity: 85.2%
2. ID: 6ba7b810-9dad-11d1-80b4-00c04fd430c8
Status: Pending
Polymarket: Will there be a government shutdown in 2026? (poly-shutdown-2026)
Kalshi: Government shutdown exceeding 24 hours in 2026? (KXGOV-SHUTDOWN-2026)
Similarity: 72.1%
Warnings:
- Settlement criteria may differ: "shutdown" vs "shutdown exceeding 24 hours"
3. Approve or Reject Candidates¶
Approve (No Warnings)¶
cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
--approve-candidate 550e8400-e29b-41d4-a716-446655440000
Approve with Warning Acknowledgment¶
If a candidate has semantic warnings, you must explicitly acknowledge them:
cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
--approve-candidate 6ba7b810-9dad-11d1-80b4-00c04fd430c8 \
--acknowledge-warnings
Reject with Reason¶
Rejections require a documented reason for the audit trail:
cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
--reject-candidate 6ba7b810-9dad-11d1-80b4-00c04fd430c8 \
--reason "Different settlement criteria - Polymarket uses announcement, Kalshi uses actual duration"
Workflow¶
graph TB
A[Run Discovery Scan] --> B[Candidates Generated]
B --> C{Review Candidate}
C -->|No Warnings| D[Approve]
C -->|Has Warnings| E{Acknowledge?}
E -->|Yes| F[Approve with Acknowledgment]
E -->|No| G[Reject with Reason]
C -->|Invalid Match| G
D --> H[Verified Mapping Created]
F --> H
G --> I[Logged to Audit Trail]
H --> J[Available for Trading]
Understanding Similarity Scores¶
Hybrid Scoring Algorithm (Phases 2-3)¶
The matching algorithm uses a hybrid approach combining multiple signals:
| Component | Weight | Description |
|---|---|---|
| Fingerprint Score | 50% | Structured field matching (entity, date, threshold, outcome) |
| Embedding Similarity | 40% | Semantic vector similarity |
| Text Similarity | 10% | Jaccard + Levenshtein fallback |
Fingerprint Field Weights (Phase 2)¶
| Field | Weight | Comparison Method |
|---|---|---|
| Entity | 30% | Exact or alias match (e.g., "BTC" → "Bitcoin") |
| Date | 25% | Year/quarter/month overlap with ±7 day tolerance |
| Threshold | 20% | Numeric comparison with 5% tolerance |
| Outcome | 15% | Binary vs multi-choice structure |
| Source | 10% | Resolution source match |
Decision Thresholds¶
| Score Range | Decision | Action |
|---|---|---|
| ≥ 0.85 | Auto-Approve | High confidence, minimal review |
| 0.70-0.85 | Escalate | LLM verification or human review |
| 0.60-0.70 | Review | Requires careful human inspection |
| < 0.60 | Auto-Reject | Below matching threshold |
Score Examples¶
| Score | Typical Matches |
|---|---|
| 90-100% | Identical entities, dates, and thresholds |
| 75-90% | Same event, minor differences (e.g., date tolerance) |
| 60-75% | Similar events, semantic warnings likely |
| <60% | Not matched (different events) |
Semantic Warnings¶
The system detects potential settlement differences:
| Warning Type | Example |
|---|---|
| Conditional Language | "if X happens" vs "X will happen" |
| Time Thresholds | "shutdown" vs "shutdown exceeding 24 hours" |
| Resolution Source | "OPM announcement" vs "actual event" |
| Outcome Definitions | "reaches $100k" vs "closes above $100k" |
Always Review Warnings
Semantic warnings indicate potential differences in how markets resolve. Approving mismatched markets can result in:
- One leg winning while the other loses
- Significant financial loss
- Unhedgeable positions
Phase 2: Fingerprint Matching¶
Fingerprint matching extracts structured fields from market titles and descriptions for precise comparison.
Market Fingerprint Structure¶
MarketFingerprint {
entity: "Trump" # Primary entity (person, crypto, event)
event_type: Acquisition # Type: PriceTarget, Election, Acquisition, etc.
metric: { threshold: 100000 } # Numeric threshold if applicable
resolution_window: 2026-01 # Resolution date/period
outcome_type: Binary # Binary or MultiOutcome
}
Entity Extraction¶
The system automatically extracts entities from market titles:
| Entity Type | Examples | Pattern |
|---|---|---|
| Person | Trump, Biden, Harris | Named individuals |
| Crypto | Bitcoin, BTC, ETH | Cryptocurrencies with alias resolution |
| PriceTarget | $100k, $50,000 | Numeric values with currency |
| Date | Q2 2026, June 2026 | Temporal references |
| Event | Super Bowl, Fed Meeting | Known event types |
Alias Resolution¶
Common aliases are automatically resolved:
| Alias | Canonical Name |
|---|---|
| BTC | Bitcoin |
| ETH | Ethereum |
| 45 | Donald Trump |
| Fed | Federal Reserve |
| Pro Football Championship | Super Bowl |
View Fingerprints (Debug)¶
# Show fingerprint for a specific market
cargo run --features discovery -- --show-fingerprint --ticker "KXBTC-100K-2026"
# Output:
# Fingerprint for KXBTC-100K-2026:
# Entity: Bitcoin
# Event Type: PriceTarget
# Threshold: $100,000 (direction: Above)
# Resolution: 2026-12-31
# Outcome: Binary
Phase 3: Embedding-Based Matching¶
Embeddings capture semantic similarity that fingerprints may miss. "Super Bowl" and "Pro Football Championship" have zero word overlap but high embedding similarity.
Embedding Pipeline¶
graph LR
A[Market Title] --> B[Embedder]
B --> C[Vector 256-dim]
C --> D[VectorStore]
D --> E[Nearest Neighbors]
E --> F[Similar Markets]
Cosine Similarity¶
Embeddings are compared using cosine similarity:
Values range from -1 (opposite) to 1 (identical).
Configuration¶
| Variable | Description | Default |
|---|---|---|
DISCOVERY_EMBEDDING_DIM |
Embedding dimension | 256 |
DISCOVERY_EMBEDDING_BATCH_SIZE |
Batch size for generation | 100 |
Phase 4: LLM Verification¶
For uncertain cases (score 0.60-0.85), the system can invoke LLM verification for human-level reasoning.
Escalation Rules¶
Cases are escalated to LLM when:
| Trigger | Condition |
|---|---|
| Uncertain Score | Fingerprint score between 0.60-0.85 |
| Warnings Present | Semantic warnings detected |
| High Value | Market volume > $10,000 |
| Conflicting Signals | High entity match but low date match |
Escalation Tiers¶
| Tier | Model | Cost | Use Case |
|---|---|---|---|
| None | - | $0 | Score ≥ 0.85, no warnings |
| Haiku | Claude Haiku | ~$0.001 | Initial screening |
| Sonnet | Claude Sonnet | ~$0.01 | Complex resolution analysis |
| Human | Manual | - | LLM uncertain or conflicts detected |
Cost Management¶
LLM verification has a configurable daily budget:
LLM Response Format¶
{
"equivalent": true,
"confidence": 0.92,
"reasoning": "Both markets resolve based on BTC/USD spot price reaching $100,000",
"warnings": [],
"resolution_differences": []
}
Phase 5: Learning from Human Feedback¶
Every approval/rejection decision improves future matching accuracy.
Decision Logging¶
All decisions are logged with full context:
{
"candidate_id": "550e8400-e29b-41d4-a716-446655440000",
"decision": "approved",
"fingerprint_score": 0.82,
"embedding_score": 0.88,
"llm_confidence": null,
"escalation_level": "None",
"category": "crypto",
"entity_corrections": null
}
Alias Learning¶
When you approve a match with entity differences, the system learns new aliases:
# If you approve a match where:
# Kalshi: "45 wins election"
# Polymarket: "Trump wins election"
#
# The system learns: "45" → "Donald Trump"
Aliases are stored with confidence scores that increase with each confirmation.
Weight Optimization¶
The system periodically optimizes fingerprint field weights based on approval patterns:
# Initial weights:
entity: 0.30, date: 0.25, threshold: 0.20, outcome: 0.15, source: 0.10
# After 100 decisions, optimized weights might become:
entity: 0.35, date: 0.28, threshold: 0.18, outcome: 0.12, source: 0.07
Training Data Export¶
Export decisions for model training:
# Export to JSONL format for training
cargo run --features discovery -- --export-training-data --output training.jsonl
Configuration¶
Environment Variables¶
Core Settings¶
| Variable | Description | Default |
|---|---|---|
DISCOVERY_SCAN_INTERVAL_SECS |
Auto-scan interval | 3600 |
DISCOVERY_SIMILARITY_THRESHOLD |
Minimum match score | 0.6 |
DISCOVERY_DB_PATH |
Database file path | discovery.db |
Phase 2-3: Scoring¶
| Variable | Description | Default |
|---|---|---|
DISCOVERY_AUTO_APPROVE_THRESHOLD |
Score for auto-approval | 0.85 |
DISCOVERY_AUTO_REJECT_THRESHOLD |
Score for auto-rejection | 0.40 |
DISCOVERY_FINGERPRINT_WEIGHT |
Weight for fingerprint score | 0.50 |
DISCOVERY_EMBEDDING_WEIGHT |
Weight for embedding score | 0.40 |
DISCOVERY_TEXT_WEIGHT |
Weight for text similarity | 0.10 |
Phase 4: LLM Verification¶
| Variable | Description | Default |
|---|---|---|
DISCOVERY_LLM_ENABLED |
Enable LLM verification | false |
DISCOVERY_LLM_BUDGET |
Daily budget in USD | 50.00 |
DISCOVERY_ESCALATION_LOW |
Lower escalation threshold | 0.60 |
DISCOVERY_ESCALATION_HIGH |
Upper escalation threshold | 0.85 |
CLI Options¶
Discovery Commands¶
| Flag | Description |
|---|---|
--discover-markets |
Run a discovery scan |
--list-candidates |
List match candidates |
--discovery-db <path> |
Custom database path |
--status <filter> |
Filter: pending, approved, rejected, all |
Approval Commands¶
| Flag | Description |
|---|---|
--approve-candidate <uuid> |
Approve a candidate |
--reject-candidate <uuid> |
Reject a candidate |
--acknowledge-warnings |
Required to approve candidates with warnings |
--reason <text> |
Required when rejecting |
Debug Commands (Phase 2-5)¶
| Flag | Description |
|---|---|
--show-fingerprint |
Display fingerprint for a market |
--test-match --kalshi <ticker> --poly <id> |
Test matching between two markets |
--evaluate-matching |
Run evaluation on golden set |
--export-training-data |
Export decisions for training |
Database¶
Discovery data is stored in SQLite:
discovery.db
├── discovered_markets # Cached market data from both platforms
├── candidates # Match candidates with status
├── match_decisions # Decision logging with scores (Phase 5)
├── learned_aliases # Entity aliases from corrections (Phase 5)
├── embeddings # Market embeddings (Phase 3)
└── audit_log # All approval/rejection decisions
Decision Logging Schema (Phase 5)¶
CREATE TABLE match_decisions (
id TEXT PRIMARY KEY,
candidate_id TEXT NOT NULL,
decision TEXT NOT NULL, -- 'approved', 'rejected'
fingerprint_score REAL,
embedding_score REAL,
llm_confidence REAL,
escalation_level TEXT, -- 'None', 'Haiku', 'Sonnet', 'Human'
category TEXT,
rejection_reason TEXT,
entity_corrections TEXT, -- JSON: {"old": "BTC", "new": "Bitcoin"}
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
Custom Database Path¶
cargo run --manifest-path arbiter-engine/Cargo.toml --features discovery -- \
--list-candidates --discovery-db /path/to/custom.db
Audit Trail¶
All decisions are logged for compliance:
{
"timestamp": "2026-01-22T15:30:00Z",
"action": "approve",
"candidate_id": "550e8400-e29b-41d4-a716-446655440000",
"polymarket_id": "poly-btc-100k",
"kalshi_id": "KXBTC-100K-2026",
"similarity_score": 0.852,
"semantic_warnings": [],
"acknowledged_warnings": false,
"session_id": "abc123"
}
Best Practices¶
Regular Discovery¶
Run discovery scans regularly to find new market opportunities:
# Cron job example: scan every hour
0 * * * * cd /path/to/arbiter-bot && cargo run --features discovery -- --discover-markets
Review Before Major Events¶
Before high-impact events (elections, economic announcements), review pending candidates to ensure mappings are accurate.
Document Rejections¶
Always provide clear rejection reasons. This helps:
- Future reviewers understand why pairs were rejected
- Improve matching algorithm over time
- Maintain compliance audit trail
Use Demo Environment First¶
Test the discovery workflow with --kalshi-demo to verify your review process before using production mappings.
Troubleshooting¶
"Discovery commands require the 'discovery' feature"¶
Rebuild with the feature flag:
"Cannot approve: candidate has semantic warnings"¶
You must explicitly acknowledge warnings:
"Rejection requires a reason"¶
Provide a reason with --reason:
"Candidate not found"¶
Verify the UUID is correct:
Related Documentation¶
- ADR-017: Market Discovery - Architecture decision
- CLI Reference - Full command reference
- Environment Variables - Configuration options