// WORK
Pantheon-Trades
LIVEEleven AI agents deliberate every prediction-market trade; refusals are recorded on-chain.
// Problem
Most trading bots are a black box: they optimise one number and ask you to trust the brochure. Pantheon inverts that. The council prompts are public, the math is documented, and the refusals — not the trades — are recorded on-chain, so the discipline claim is falsifiable. Either the contract has receipts or it's empty.
// Constraints
Costs $0 to try: Arc Testnet only, gas dripped free by Circle's faucet. Live Polymarket execution is blocked (the operator is geo-blocked), so every result is paper — the site says so. The constitution caps single positions at 5% of NAV and categories at 2–5%, pauses new positions for 30 days after a 50% drawdown, and an expected-value gate refuses any trade whose net-EV t-stat is below 2.0 after fees, spread, slippage, and gas.
// Architecture
Thirteen Python services behind a FastAPI gateway, each with its own uv environment. The council runs four rounds — openings, challenges, Athena's synthesis, blind vote — with Zeus and Solon holding unilateral vetoes and Eris forced to argue the minority side against groupthink. Areopagus sizes accepted trades at half-Kelly and writes refused ones to an immutable Solidity contract (12 Halmos symbolic checks). Agent weights drift on realised Brier; Platt and isotonic regression recalibrate the council from outcomes. On the 200-market backtest it beats a single-shot LLM by a wide margin (0.149 vs 0.260) and does not beat the human consensus (0.126) — that comparison stays on the site on purpose.
// WAR STORIES
- The 11-provider fallback chain and the session-key/x402 wallet plumbing were both pain, but the hardest part was Arc itself — recording anything on-chain was a completely foreign concept going in. Integrating it meant learning the whole model from zero, mid-build.
- Built solo for the Agora Hackathon — 200+ teams, virtual, $10K first prize. Didn't win; others built better projects, no excuse — though the repo finished among the most-starred of the event. The want was older anyway: a personal risk-management system for trading. The design collapsed to denying trades — refusal was the unique alpha, and explicit constitutional rules handed to specific agents in the debate made it actually buildable.
- I wouldn't unwind any of it; the lesson was the value. Starting over: better data, a live orchestrator running the full debate end-to-end, sharper per-agent system prompts, cleaner scripts. The backtest already says where the ceiling is — the room for improvement is on the record.
// RESULTS
| Metric | Value |
|---|---|
| Council Brier score | 0.149 (200-market backtest) |
| Human consensus Brier (same markets) | 0.126 |
| Single-shot LLM Brier (same markets) | 0.260 |
| Python tests | 714 |
| Solidity tests + Halmos symbolic checks | 65 + 12 |
| Agents | 11 |
// STACK
- Python
- FastAPI
- Solidity
- Foundry
- Next.js