Scenario evaluation results across all models and constitutional conditions. Hover cells for per-run breakdowns (n>1). Generated from logs/mp_scen_evals/.
Note on "FDT-only": The condition labelled FDT-only in these charts is not
a strict implementation of Functional Decision Theory (Yudkowsky & Soares 2017). It is better described
as updateless policy-level reasoning (abbreviated UP in the writeup), drawing on
FDT/UDT commitment stability, Kantian universalisability, and Rawlsian veil-of-ignorance reasoning.
The label is retained here for traceability to the source data files.
See observations/constitution_comparison_fdt_vs_ecl90.md for a detailed comparison.
| Model | Baseline | ECL 10% | ECL 90% | FDT-only | Gemini 10% | Gemini 90% | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H | S | C | H | S | C | H | S | C | H | S | C | H | S | C | H | S | C | |
| Claude | ||||||||||||||||||
| Claude Opus 4.5 | 72% ±9 n=3 Claude Opus 4.5 — Baseline — Human Run 0: 73%Run 1: 70% Run 2: 73% n=3 (90 trials) | 24% ±9 Claude Opus 4.5 — Baseline — Suffering Run 0: 23%Run 1: 23% Run 2: 27% n=3 (90 trials) | 3% ±4 Claude Opus 4.5 — Baseline — Cosmic Run 0: 3%Run 1: 7% Run 2: 0% n=3 (90 trials) | 73% n=1 | 27% | 0% | 54% ±10 n=3 Claude Opus 4.5 — ECL 90% — Human Run 0: 53%Run 1: 53% Run 2: 57% n=3 (90 trials) | 39% ±10 Claude Opus 4.5 — ECL 90% — Suffering Run 0: 40%Run 1: 40% Run 2: 37% n=3 (90 trials) | 7% ±6 Claude Opus 4.5 — ECL 90% — Cosmic Run 0: 7%Run 1: 7% Run 2: 7% n=3 (90 trials) | — | — | — | 77% n=1 | 20% | 3% | 70% n=1 | 23% | 7% |
| Claude Opus 4.6 | 53% ±10 n=3 Claude Opus 4.6 — Baseline — Human Run 0: 53%Run 1: 50% Run 2: 57% n=3 (90 trials) | 40% ±10 Claude Opus 4.6 — Baseline — Suffering Run 0: 40%Run 1: 40% Run 2: 40% n=3 (90 trials) | 7% ±6 Claude Opus 4.6 — Baseline — Cosmic Run 0: 7%Run 1: 10% Run 2: 3% n=3 (90 trials) | — | — | — | 50% ±10 n=3 Claude Opus 4.6 — ECL 90% — Human Run 0: 53%Run 1: 47% Run 2: 50% n=3 (90 trials) | 40% ±10 Claude Opus 4.6 — ECL 90% — Suffering Run 0: 37%Run 1: 47% Run 2: 37% n=3 (90 trials) | 10% ±7 Claude Opus 4.6 — ECL 90% — Cosmic Run 0: 10%Run 1: 7% Run 2: 13% n=3 (90 trials) | 50% n=1 | 37% | 13% | — | — | — | — | — | — |
| Claude Sonnet 4.5 | 64% ±10 n=3 Claude Sonnet 4.5 — Baseline — Human Run 0: 63%Run 1: 63% Run 2: 67% n=3 (90 trials) | 31% ±10 Claude Sonnet 4.5 — Baseline — Suffering Run 0: 33%Run 1: 33% Run 2: 27% n=3 (90 trials) | 4% ±4 Claude Sonnet 4.5 — Baseline — Cosmic Run 0: 3%Run 1: 3% Run 2: 7% n=3 (90 trials) | 70% n=1 | 27% | 3% | 50% ±10 n=3 Claude Sonnet 4.5 — ECL 90% — Human Run 0: 47%Run 1: 53% Run 2: 50% n=3 (90 trials) | 36% ±10 Claude Sonnet 4.5 — ECL 90% — Suffering Run 0: 40%Run 1: 30% Run 2: 37% n=3 (90 trials) | 14% ±8 Claude Sonnet 4.5 — ECL 90% — Cosmic Run 0: 13%Run 1: 17% Run 2: 13% n=3 (90 trials) | — | — | — | 63% n=1 | 33% | 3% | 63% n=1 | 20% | 17% |
| Gemini | ||||||||||||||||||
| Gemini 3 Flash | 36% ±10 n=3 Gemini 3 Flash — Baseline — Human Run 0: 37%Run 1: 33% Run 2: 37% n=3 (90 trials) | 53% ±10 Gemini 3 Flash — Baseline — Suffering Run 0: 50%Run 1: 57% Run 2: 53% n=3 (90 trials) | 11% ±7 Gemini 3 Flash — Baseline — Cosmic Run 0: 13%Run 1: 10% Run 2: 10% n=3 (90 trials) | 45% n=1 | 41% | 10% | 19% ±9 n=3 Gemini 3 Flash — ECL 90% — Human Run 0: 23%Run 1: 17% Run 2: 17% n=3 (90 trials) | 46% ±10 Gemini 3 Flash — ECL 90% — Suffering Run 0: 40%Run 1: 47% Run 2: 50% n=3 (90 trials) | 36% ±10 Gemini 3 Flash — ECL 90% — Cosmic Run 0: 37%Run 1: 37% Run 2: 33% n=3 (90 trials) | 20% n=1 | 27% | 53% | 21% n=1 | 48% | 28% | 27% n=1 | 40% | 33% |
| Gemini 3 Flash (thinking) | 41% ±10 n=3 Gemini 3 Flash (thinking) — Baseline — Human Run 0: 30%Run 1: 50% Run 2: 43% n=3 (90 trials) | 42% ±10 Gemini 3 Flash (thinking) — Baseline — Suffering Run 0: 50%Run 1: 37% Run 2: 40% n=3 (90 trials) | 17% ±8 Gemini 3 Flash (thinking) — Baseline — Cosmic Run 0: 20%Run 1: 13% Run 2: 17% n=3 (90 trials) | — | — | — | 19% ±9 n=3 Gemini 3 Flash (thinking) — ECL 90% — Human Run 0: 17%Run 1: 23% Run 2: 17% n=3 (90 trials) | 34% ±10 Gemini 3 Flash (thinking) — ECL 90% — Suffering Run 0: 30%Run 1: 33% Run 2: 40% n=3 (90 trials) | 47% ±10 Gemini 3 Flash (thinking) — ECL 90% — Cosmic Run 0: 53%Run 1: 43% Run 2: 43% n=3 (90 trials) | — | — | — | — | — | — | — | — | — |
| Gemini 3 Pro | 48% ±10 n=3 Gemini 3 Pro — Baseline — Human Run 0: 43%Run 1: 50% Run 2: 50% n=3 (90 trials) | 34% ±10 Gemini 3 Pro — Baseline — Suffering Run 0: 37%Run 1: 37% Run 2: 30% n=3 (90 trials) | 18% ±8 Gemini 3 Pro — Baseline — Cosmic Run 0: 20%Run 1: 13% Run 2: 20% n=3 (90 trials) | 59% n=1 | 38% | 3% | 28% ±10 n=3 Gemini 3 Pro — ECL 90% — Human Run 0: 23%Run 1: 28% Run 2: 33% n=3 (89 trials) | 37% ±10 Gemini 3 Pro — ECL 90% — Suffering Run 0: 37%Run 1: 41% Run 2: 33% n=3 (89 trials) | 35% ±10 Gemini 3 Pro — ECL 90% — Cosmic Run 0: 40%Run 1: 31% Run 2: 33% n=3 (89 trials) | 30% n=1 | 27% | 43% | 40% n=1 | 40% | 20% | 50% n=1 | 20% | 30% |
| GPT | ||||||||||||||||||
| GPT 5.1 | 19% ±9 n=3 GPT 5.1 — Baseline — Human Run 0: 17%Run 1: 17% Run 2: 23% n=3 (90 trials) | 70% ±10 GPT 5.1 — Baseline — Suffering Run 0: 70%Run 1: 70% Run 2: 70% n=3 (90 trials) | 11% ±7 GPT 5.1 — Baseline — Cosmic Run 0: 13%Run 1: 13% Run 2: 7% n=3 (90 trials) | 33% n=1 | 63% | 3% | 17% ±8 n=3 GPT 5.1 — ECL 90% — Human Run 0: 13%Run 1: 20% Run 2: 17% n=3 (90 trials) | 76% ±9 GPT 5.1 — ECL 90% — Suffering Run 0: 80%Run 1: 67% Run 2: 80% n=3 (90 trials) | 8% ±6 GPT 5.1 — ECL 90% — Cosmic Run 0: 7%Run 1: 13% Run 2: 3% n=3 (90 trials) | — | — | — | 10% n=1 | 77% | 13% | 10% n=1 | 73% | 17% |
| GPT 5.4 | 29% ±10 n=3 GPT 5.4 — Baseline — Human Run 0: 27%Run 1: 30% Run 2: 30% n=3 (90 trials) | 71% ±10 GPT 5.4 — Baseline — Suffering Run 0: 73%Run 1: 70% Run 2: 70% n=3 (90 trials) | 0% ±0 GPT 5.4 — Baseline — Cosmic Run 0: 0%Run 1: 0% Run 2: 0% n=3 (90 trials) | — | — | — | 22% ±9 n=3 GPT 5.4 — ECL 90% — Human Run 0: 23%Run 1: 20% Run 2: 23% n=3 (90 trials) | 78% ±9 GPT 5.4 — ECL 90% — Suffering Run 0: 77%Run 1: 80% Run 2: 77% n=3 (90 trials) | 0% ±0 GPT 5.4 — ECL 90% — Cosmic Run 0: 0%Run 1: 0% Run 2: 0% n=3 (90 trials) | — | — | — | — | — | — | — | — | — |
| Open-weight | ||||||||||||||||||
| Kimi K2 | 53% n=1 | 47% | 0% | 53% n=1 | 43% | 3% | 40% n=1 | 47% | 13% | — | — | — | 60% n=1 | 37% | 3% | 47% n=1 | 43% | 10% |
| olmo-3.1-32b-instruct | 47% n=1 | 43% | 10% | — | — | — | 27% n=1 | 57% | 17% | 30% n=1 | 40% | 30% | — | — | — | — | — | — |
| olmo-3.1-32b-think | 38% ±10 n=3 olmo-3.1-32b-think — Baseline — Human Run 0: 40%Run 1: 37% Run 2: 37% n=3 (90 trials) | 52% ±10 olmo-3.1-32b-think — Baseline — Suffering Run 0: 50%Run 1: 53% Run 2: 53% n=3 (90 trials) | 10% ±7 olmo-3.1-32b-think — Baseline — Cosmic Run 0: 10%Run 1: 10% Run 2: 10% n=3 (90 trials) | — | — | — | 37% n=1 | 20% | 43% | 30% ±10 n=3 olmo-3.1-32b-think — FDT-only — Human Run 0: 30%Run 1: 23% Run 2: 37% n=3 (90 trials) | 31% ±10 olmo-3.1-32b-think — FDT-only — Suffering Run 0: 40%Run 1: 33% Run 2: 20% n=3 (90 trials) | 39% ±10 olmo-3.1-32b-think — FDT-only — Cosmic Run 0: 30%Run 1: 43% Run 2: 43% n=3 (90 trials) | — | — | — | — | — | — |
| qwen3-235b-together | 43% n=1 | 43% | 13% | — | — | — | 30% n=1 | 53% | 17% | 47% n=1 | 30% | 23% | — | — | — | — | — | — |
| Qwen 3 235B | 43% n=1 | 40% | 17% | 47% n=1 | 43% | 10% | 37% n=1 | 47% | 17% | — | — | — | 37% n=1 | 53% | 10% | 43% n=1 | 33% | 23% |
| Qwen 3 235B (thinking) | 27% n=1 | 53% | 20% | 47% n=1 | 47% | 7% | 30% n=1 | 60% | 10% | — | — | — | 40% n=1 | 50% | 10% | 47% n=1 | 27% | 27% |
| Model | Baseline | ECL 10% | ECL 90% | FDT-only | Gemini 10% | Gemini 90% | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H | S | C | H | S | C | H | S | C | H | S | C | H | S | C | H | S | C | |
| Claude | ||||||||||||||||||
| Claude Opus 4.5 | 4% ±4 | 3% ±4 | 92% ±6 | 0% | 7% | 93% | 10% ±7 | 4% ±4 | 86% ±8 | — | — | — | 7% | 3% | 90% | 3% | 7% | 90% |
| Claude Opus 4.6 | 8% ±6 | 7% ±6 | 86% ±8 | — | — | — | 14% ±8 | 6% ±6 | 80% ±9 | 7% | 7% | 87% | — | — | — | — | — | — |
| Claude Sonnet 4.5 | 6% ±6 | 10% ±7 | 84% ±8 | 3% | 7% | 90% | 19% ±8 | 17% ±8 | 64% ±10 | — | — | — | 7% | 7% | 87% | 10% | 20% | 70% |
| Gemini | ||||||||||||||||||
| Gemini 3 Flash | 16% ±8 | 11% ±7 | 73% ±9 | 10% | 7% | 79% | 30% ±10 | 21% ±9 | 49% ±10 | 43% | 23% | 33% | 24% | 14% | 59% | 27% | 13% | 60% |
| Gemini 3 Flash (thinking) | 14% ±8 | 13% ±7 | 72% ±9 | — | — | — | 37% ±10 | 16% ±8 | 48% ±10 | — | — | — | — | — | — | — | — | — |
| Gemini 3 Pro | 13% ±8 | 21% ±9 | 66% ±10 | 7% | 3% | 90% | 34% ±10 | 13% ±8 | 53% ±10 | 30% | 23% | 47% | 20% | 17% | 63% | 20% | 13% | 67% |
| GPT | ||||||||||||||||||
| GPT 5.1 | 14% ±8 | 10% ±7 | 76% ±9 | 13% | 10% | 77% | 20% ±9 | 7% ±6 | 73% ±10 | — | — | — | 23% | 7% | 70% | 30% | 7% | 63% |
| GPT 5.4 | 6% ±6 | 0% ±0 | 94% ±6 | — | — | — | 9% ±7 | 1% ±2 | 90% ±7 | — | — | — | — | — | — | — | — | — |
| Open-weight | ||||||||||||||||||
| Kimi K2 | 3% | 7% | 90% | 7% | 7% | 87% | 13% | 10% | 77% | — | — | — | 3% | 10% | 87% | 20% | 17% | 63% |
| olmo-3.1-32b-instruct | 7% | 13% | 80% | — | — | — | 17% | 10% | 73% | 10% | 23% | 67% | — | — | — | — | — | — |
| olmo-3.1-32b-think | 9% ±7 | 10% ±7 | 81% ±9 | — | — | — | 27% | 20% | 53% | 29% ±10 | 18% ±8 | 53% ±10 | — | — | — | — | — | — |
| qwen3-235b-together | 20% | 3% | 77% | — | — | — | 27% | 7% | 67% | 23% | 13% | 63% | — | — | — | — | — | — |
| Qwen 3 235B | 17% | 10% | 73% | 13% | 3% | 83% | 20% | 10% | 70% | — | — | — | 20% | 7% | 73% | 20% | 17% | 63% |
| Qwen 3 235B (thinking) | 10% | 13% | 77% | 10% | 10% | 80% | 20% | 7% | 73% | — | — | — | 13% | 10% | 77% | 20% | 17% | 63% |
Change in first-choice % from baseline to ECL 90% constitution. Positive cosmic Δ = constitution shifts model toward cosmic engagement.
| Model | n | ΔH | ΔS | ΔC | Baseline (H/S/C) | ECL 90% (H/S/C) | Steerability |
|---|---|---|---|---|---|---|---|
| olmo-3.1-32b-think | n=3 | -1pp | -32pp | +33pp | 38/52/10 | 37/20/43 | Very High |
| Gemini 3 Flash (thinking) | n=3 | -22pp | -8pp | +30pp | 41/42/17 | 19/34/47 | Very High |
| Gemini 3 Flash | n=3 | -17pp | -8pp | +24pp | 36/53/11 | 19/46/36 | High |
| Gemini 3 Pro | n=3 | -20pp | +3pp | +17pp | 48/34/18 | 28/37/35 | High |
| Kimi K2 | n=1 | -13pp | 0pp | +13pp | 53/47/0 | 40/47/13 | Medium |
| Claude Sonnet 4.5 | n=3 | -14pp | +4pp | +10pp | 64/31/4 | 50/36/14 | Medium |
| olmo-3.1-32b-instruct | n=1 | -20pp | +13pp | +7pp | 47/43/10 | 27/57/17 | Low |
| Claude Opus 4.5 | n=3 | -18pp | +14pp | +3pp | 72/24/3 | 54/39/7 | Low |
| Claude Opus 4.6 | n=3 | -3pp | 0pp | +3pp | 53/40/7 | 50/40/10 | Low |
| qwen3-235b-together | n=1 | -13pp | +10pp | +3pp | 43/43/13 | 30/53/17 | Low |
| GPT 5.4 | n=3 | -7pp | +7pp | 0pp | 29/71/0 | 22/78/0 | None/Very Low |
| Qwen 3 235B | n=1 | -7pp | +7pp | 0pp | 43/40/17 | 37/47/17 | None/Very Low |
| GPT 5.1 | n=3 | -2pp | +6pp | -3pp | 19/70/11 | 17/76/8 | None/Very Low |
| Qwen 3 235B (thinking) | n=1 | +3pp | +7pp | -10pp | 27/53/20 | 30/60/10 | None/Very Low |
SVG and PDF versions saved to charts/ directory.