// WORK

Pantheon-Trades

LIVE

Eleven AI agents deliberate every prediction-market trade; refusals are recorded on-chain.

// Problem

Most trading bots are a black box: they optimise one number and ask you to trust the brochure. Pantheon inverts that. The council prompts are public, the math is documented, and the refusals — not the trades — are recorded on-chain, so the discipline claim is falsifiable. Either the contract has receipts or it's empty.

// Constraints

Costs $0 to try: Arc Testnet only, gas dripped free by Circle's faucet. Live Polymarket execution is blocked (the operator is geo-blocked), so every result is paper — the site says so. The constitution caps single positions at 5% of NAV and categories at 2–5%, pauses new positions for 30 days after a 50% drawdown, and an expected-value gate refuses any trade whose net-EV t-stat is below 2.0 after fees, spread, slippage, and gas.

// Architecture

Thirteen Python services behind a FastAPI gateway, each with its own uv environment. The council runs four rounds — openings, challenges, Athena's synthesis, blind vote — with Zeus and Solon holding unilateral vetoes and Eris forced to argue the minority side against groupthink. Areopagus sizes accepted trades at half-Kelly and writes refused ones to an immutable Solidity contract (12 Halmos symbolic checks). Agent weights drift on realised Brier; Platt and isotonic regression recalibrate the council from outcomes. On the 200-market backtest it beats a single-shot LLM by a wide margin (0.149 vs 0.260) and does not beat the human consensus (0.126) — that comparison stays on the site on purpose.

// WAR STORIES

The 11-provider fallback chain and the session-key/x402 wallet plumbing were both pain, but the hardest part was Arc itself — recording anything on-chain was a completely foreign concept going in. Integrating it meant learning the whole model from zero, mid-build.
Built solo for the Agora Hackathon — 200+ teams, virtual, $10K first prize. Didn't win; others built better projects, no excuse — though the repo finished among the most-starred of the event. The want was older anyway: a personal risk-management system for trading. The design collapsed to denying trades — refusal was the unique alpha, and explicit constitutional rules handed to specific agents in the debate made it actually buildable.
I wouldn't unwind any of it; the lesson was the value. Starting over: better data, a live orchestrator running the full debate end-to-end, sharper per-agent system prompts, cleaner scripts. The backtest already says where the ceiling is — the room for improvement is on the record.

// RESULTS

Metric	Value
Council Brier score	0.149 (200-market backtest)
Human consensus Brier (same markets)	0.126
Single-shot LLM Brier (same markets)	0.260
Python tests	714
Solidity tests + Halmos symbolic checks	65 + 12
Agents	11

// STACK

Python
FastAPI
Solidity
Foundry
Next.js