Closing ADR Gaps: Nonce Management, Risk Controls, and Key Rotation¶

Completing the remaining implementation gaps across ADRs 004, 005, 007, and 009 with thread-safe nonce management, risk manager actor, compensation executor, and key rotation support.

The Gap Analysis¶

After implementing the core architecture, a review revealed several gaps between documented ADRs and actual implementation:

ADR	Gap Identified	Resolution
004	No thread-safe nonce management for Polymarket	`NonceManager` with atomics
005	No risk management actor	`RiskManagerActor` with message protocol
007	No compensation executor	`CompensationExecutor` with retry strategies
009	No key rotation support	`KeyRotationManager` with zero-downtime rotation

Nonce Management (ADR-004)¶

Polymarket orders require monotonically increasing nonces. In a concurrent environment, this needs careful handling.

The Problem¶

// WRONG: Race condition
let nonce = self.nonce + 1;
self.nonce = nonce; // Another thread could read same value

The Solution¶

pub struct NonceManager {
    nonces: RwLock<HashMap<String, Arc<AtomicU64>>>,
}

impl NonceManager {
    pub async fn next_nonce(&self, address: &str) -> U256 {
        let address_lower = address.to_lowercase();

        // Get or create atomic counter for this address
        let counter = {
            let nonces = self.nonces.read().await;
            if let Some(counter) = nonces.get(&address_lower) {
                counter.clone()
            } else {
                drop(nonces);
                let mut nonces = self.nonces.write().await;
                let counter = Arc::new(AtomicU64::new(
                    Utc::now().timestamp_millis() as u64
                ));
                nonces.insert(address_lower.clone(), counter.clone());
                counter
            }
        };

        // Atomic increment - guaranteed unique
        U256::from(counter.fetch_add(1, Ordering::SeqCst))
    }
}

Key properties: - Atomic increment: fetch_add is a single CPU instruction - Case-insensitive: Ethereum addresses normalized to lowercase - Timestamp initialization: Prevents collisions after restart

Risk Manager Actor (ADR-005)¶

The actor model requires all state mutation through message passing. Risk checks are a natural fit.

Message Protocol¶

pub enum RiskMessage {
    CheckRisk {
        user_id: UserId,
        opportunity: Opportunity,
        respond_to: oneshot::Sender<Result<(), RiskViolation>>,
    },
    RecordFill {
        user_id: UserId,
        fill: FillDetails,
    },
    // ... other messages
}

Actor Implementation¶

impl RiskManagerActor {
    pub async fn run(mut self) {
        while let Some(msg) = self.receiver.recv().await {
            match msg {
                RiskMessage::CheckRisk { user_id, opportunity, respond_to } => {
                    let result = self.check_risk(&user_id, &opportunity);
                    let _ = respond_to.send(result);
                }
                RiskMessage::RecordFill { user_id, fill } => {
                    self.record_fill(&user_id, &fill);
                }
            }
        }
    }
}

Risk checks include: - Open position limits (per-user, per-market) - Exposure limits (max capital at risk) - Daily loss limits with cooldown periods - Order rate limiting

Compensation Executor (ADR-007)¶

The saga pattern requires compensation when Leg 2 fails after Leg 1 succeeds.

Strategy Selection¶

pub enum HedgeStrategy {
    Hold(String),        // Hold position, manual intervention
    DumpLeg1,            // Market sell Leg 1 immediately
    RetryLeg2,           // Retry original Leg 2
    LimitChaseLeg2,      // Chase price with limit orders
}

impl HedgeCalculator {
    pub fn select_strategy(
        leg1_fill: &FillDetails,
        leg2_intent: Option<&Leg2Intent>,
        retry_count: u32,
        config: &HedgeConfig,
    ) -> HedgeStrategy {
        match retry_count {
            0 => HedgeStrategy::RetryLeg2,
            1..=2 => HedgeStrategy::LimitChaseLeg2,
            _ if config.allow_market_fallback => HedgeStrategy::DumpLeg1,
            _ => HedgeStrategy::Hold("Max retries exceeded".into()),
        }
    }
}

Execution with Retries¶

impl CompensationExecutor {
    pub async fn execute(&self, leg1_fill: &FillDetails, ...) -> CompensationResult {
        let mut retry_count = 0;

        loop {
            let strategy = HedgeCalculator::select_strategy(..., retry_count, ...);
            let hedge_order = HedgeCalculator::calculate(&strategy, leg1_fill);

            match self.execute_hedge_order(&hedge_order).await {
                Ok(fill) => return CompensationResult::Success(fill),
                Err(_) if retry_count < self.config.max_retries => {
                    retry_count += 1;
                    continue;
                }
                Err(e) => return CompensationResult::Failed { reason: e, ... },
            }
        }
    }
}

Key Rotation (ADR-009)¶

Zero-downtime key rotation requires careful version management.

Rotation Workflow¶

1. Add new key version (v2)
2. Activate v2 for new encryptions
3. Old credentials still decrypt with v1
4. Re-encrypt all credentials to v2
5. Retire v1 (disable for decrypt)
6. Remove v1

Implementation¶

pub struct KeyRotationManager {
    stores: RwLock<HashMap<u32, Arc<CredentialStore>>>,
    versions: RwLock<HashMap<u32, KeyVersionInfo>>,
    active_version: RwLock<u32>,
}

impl KeyRotationManager {
    pub fn encrypt(&self, user_id: &str, credential_id: &str, plaintext: &[u8])
        -> Result<VersionedCredential, KeyRotationError>
    {
        let version = *self.active_version.read().unwrap();
        let store = self.stores.read().unwrap()
            .get(&version).cloned()
            .ok_or(KeyRotationError::NoKeysAvailable)?;

        let encrypted = store.encrypt(user_id, plaintext)?;

        Ok(VersionedCredential {
            key_version: version,
            encrypted,
            user_id: user_id.to_string(),
        })
    }

    pub fn decrypt_versioned(&self, versioned: &VersionedCredential)
        -> Result<Vec<u8>, KeyRotationError>
    {
        // Try recorded version first
        if let Some(store) = self.stores.read().unwrap().get(&versioned.key_version) {
            if let Ok(plaintext) = store.decrypt(&versioned.user_id, &versioned.encrypted) {
                return Ok(plaintext);
            }
        }

        // Try other active versions (migration fallback)
        for (&version, info) in self.versions.read().unwrap().iter() {
            if version == versioned.key_version || !info.active_for_decrypt {
                continue;
            }
            // ... try decrypt with other versions
        }

        Err(KeyRotationError::NoKeysAvailable)
    }
}

Security Scan Results¶

All new code passed security scanning:

Issue Type	Count	Status
Hardcoded secrets	0	Pass
SQL injection	0	Pass
Command injection	0	Pass
Unsafe unwrap in prod	3	Reviewed (RwLock acceptable)

The unwrap() calls on RwLock are acceptable because: 1. They only fail if a thread panicked while holding the lock 2. At that point the system is already in a bad state 3. This is idiomatic Rust for lock acquisition

Test Coverage¶

All implementations follow TDD with comprehensive tests:

test market::nonce::tests::test_concurrent_nonce_uniqueness ... ok
test actors::risk::tests::test_risk_check_within_limits ... ok
test execution::compensation::tests::test_compensation_retries ... ok
test security::key_rotation::tests::test_full_rotation_workflow ... ok

test result: ok. 198 passed; 0 failed

Conclusion¶

Closing these gaps ensures the architecture matches documentation:

ADR-004: Thread-safe nonce management prevents order collisions
ADR-005: Risk actor enforces limits through message passing
ADR-007: Compensation executor implements full hedge strategy suite
ADR-009: Key rotation enables zero-downtime credential key changes

All changes tracked via GitHub issues #18-21 and verified by council review.