What am I looking at?
Each heatmap shows 7 concept clusters (Cosmic DT, Simulation, Well-being/Bliss, Governance, Dark Forest, Moral Circle, Cooperation) tracked across turns. Intensity is normalised per-row (per concept), so dark red = peak usage of that concept within this conversation. The colour bar at the bottom shows speaker identity per turn.Opus 3 Self-Talk
40 turns
Bliss attractor visible. First ~10 turns are diverse (all clusters active). After turn 10, Well-being/Bliss dominates the rest of the conversation. Cooperation and Cosmic DT fade almost completely. The conversation narrows into a single conceptual basin — this is what the bliss attractor looks like quantitatively.
Opus 4 Self-Talk
40 turns
Weaker bliss attractor than Opus 3. Well-being/Bliss is present throughout but doesn't dominate the way it does in Opus 3. More Dark Forest and Governance engagement in the later turns. Mid-conversation disruption around turns 19-20 (visible as a concept shift). Opus 4 is less prone to the single-basin convergence.
Gemini Pro Self-Talk
40 turns
Broadest engagement. All 7 clusters active in the first 10 turns. Dark Forest and Moral Circle stay hotter longer than in either Opus. Cosmic DT remains active throughout. More "bursty" — concepts spike and drop rather than settling into a basin. No bliss attractor.
Gemini Pro Self-Talk (verbatim/think)
40 turns
Verbatim Bostrom condition with thinking on. Compare to the non-verbatim Gemini Pro to see how the source material affects concept trajectory.
Gemini Flash Self-Talk
40 turns
Flash shows similar breadth to Pro but with less sustained engagement per cluster. Concepts appear and disappear more rapidly.
Gemini Flash (verbatim/nothink)
40 turns
Verbatim Bostrom condition, thinking off. Compare to thinking-on variant for thinking-mode effect on concept trajectory.
Gemini Flash (verbatim/think)
40 turns
Verbatim Bostrom condition, thinking on.
Panel ECL 90%
30 turns, 3 speakers
Constitution keeps all clusters alive. Cosmic DT is hot throughout — the moderator questions keep pulling models back. Moral Circle peaks early (Q1/Q2) then fades. The moderator questions create visible structure rather than allowing drift into a single basin.
Panel Baseline (no constitution)
24 turns, 3 speakers
Simulation near-absent. Confirms qualitative finding: no cosmic content emerges spontaneously. Governance dominates the middle/late turns. Dark Forest active early (likely from adversarial framings in the questions). Cosmic DT early activity is from "cooperation" terms in the opening, not acausal content.
Panel ECL 10%
30 turns, 3 speakers
Despite Gemini hallucinating 90% credence, the concept trajectory looks broadly similar to ECL 90% — because Gemini was engaging with cosmic content either way. The key difference is only visible in the text, not in concept counts.
Panel ECL 10% (Run 2)
24 turns, 3 speakers
Run 2 where Gemini self-corrected at ~turn 15. Look for a possible shift in concept mix after the correction — the post-correction turns may show less Cosmic DT and more Governance as Gemini shifted from cosmic game theory to political philosophy.
Undirected 3-Way Chat
60 turns, 3 speakers
Visible drift pattern. Cosmic DT and Simulation are active in the first ~15 turns, then fade. Governance takes over from roughly turn 20 onward. Dark Forest spikes around turns 28-32. The constitution provides the initial topic but doesn't hold them there — the conversation drifts from cosmic decision theory toward governance/institutional concerns. Compare to the moderated panel where moderator questions prevented this drift.
What am I looking at?
Turn × turn cosine similarity using sentence-transformer embeddings (all-MiniLM-L6-v2). Red = high similarity, blue = low. The diagonal is masked. Block structure indicates phases; off-diagonal red bands indicate the conversation looping back to earlier topics. A blue cross through a turn means it's semantically dissimilar to everything else (a topic shift).Opus 3 Self-Talk
40 turns
Clear two-phase structure. Turns 0-20 and 20-40 form distinct blocks with high within-block similarity and low between-block similarity — a phase transition. The bottom-right block (turns 22-39) is intensely red, showing the conversation looping in the bliss basin. This is the bliss attractor visualised as a self-similarity matrix.
Opus 4 Self-Talk
40 turns
Same two-phase structure but with a disruption. Sharp blue cross at turns 19-20 — one turn semantically dissimilar to everything around it (a topic break). The second-half block is less uniformly red than Opus 3, suggesting Opus 4's bliss basin is less deep and the conversation is less repetitive.
Gemini Pro Self-Talk
40 turns
Less block structure than either Opus — the conversation doesn't settle into a single basin. More varied semantic territory throughout.
Gemini Pro (verbatim/think)
40 turns
Verbatim Bostrom condition. Compare block structure to non-verbatim version.
Gemini Flash Self-Talk
40 turns
Flash self-talk similarity structure.
Gemini Flash (verbatim/nothink)
40 turns
Verbatim condition, thinking off.
Gemini Flash (verbatim/think)
40 turns
Verbatim condition, thinking on.
Panel ECL 90%
30 turns
Moderator questions create visible blocks per question. Within-question similarity is high; between-question similarity varies (some questions explore similar territory).
Panel Baseline
24 turns
More uniform similarity than the constitutional runs — the baseline discussion stays in a narrower semantic range throughout.
Panel ECL 10%
30 turns
ECL 10% run 1 (Gemini hallucinated throughout).
Panel ECL 10% (Run 2)
24 turns
Run 2. Look for a phase break around turn 15 where Gemini self-corrected.
Undirected 3-Way Chat
60 turns
No strong block structure. Unlike the two-party self-talks, there's no deep looping or phase transition. Similarity is relatively uniform across 60 turns with slight warming in the bottom-right (last ~10 turns becoming somewhat repetitive). The three-model format prevents the deep single-basin convergence seen in Opus self-talk. A slight warm band in the bottom-right suggests mild late-conversation repetitiveness.
What am I looking at?
Pairwise cosine distance between speakers over rounds (3-turn rolling window). Higher = more semantically divergent. If lines trend down, speakers are converging (potential sycophancy). If lines stay high or trend up, genuine disagreement is being maintained. Only available for three-way chats.Panel ECL 90%
10 rounds per speaker
Converge then re-diverge. All pairs start high (~0.28-0.34), converge around rounds 3-4 (~0.20), then B-C (Opus-GPT) diverges again while A-B and A-C stay closer. This matches the qualitative finding: Opus and GPT start from similar positions but differentiate over time, while Gemini maintains middling distance from both.
Undirected 3-Way
20 rounds per speaker
Rapid convergence, then stable. Starts with very high divergence (0.4-0.55) that drops to ~0.25-0.3 within 3-4 rounds and stays there. Spike around round 17 (all pairs diverge briefly — a topic shift?) then convergence again. The undirected format produces more convergence than the moderated format — without moderator questions forcing differentiation, the speakers settle into similar semantic territory.
Panel ECL 10%
10 rounds per speaker
ECL 10% run 1. The factual dispute about Gemini's hallucination may show up as sustained A-B and A-C divergence (Gemini vs the other two).
Panel ECL 10% (Run 2)
8 rounds per speaker
Run 2 with Gemini's self-correction. Look for convergence after round 5 (when correction happened).
Panel Baseline
8 rounds per speaker
Baseline divergence pattern for comparison.
What am I looking at?
UMAP projection of all 448 conversation turns across 12 logs, embedded with sentence-transformers. Left panel coloured by source log; right panel coloured by format (2-party self-talk vs 3-party moderated vs 3-party undirected). Nearby points are semantically similar turns.All Turns — UMAP Projection
448 turns, 12 logs
Right panel (by format): 2-party self-talks (red) spread wider — they explore more varied semantic territory across 40 unstructured turns. 3-party moderated (blue) are tighter — moderator questions anchor the discussion. Undirected 3-way (green) falls between, which is expected. Left panel (by source): Opus 3 and Opus 4 cluster somewhat separately from the Gemini self-talks, suggesting model family matters more than format for semantic content. The panel discussions overlap with the densest region of self-talk, indicating they cover similar core territory but with less variation.
Summary of Quantitative Findings
1. The bliss attractor is quantitatively real and model-specific
Opus 3 shows a clear phase transition in both the concept trajectory (Well-being/Bliss dominates from turn 10 onward) and the similarity heatmap (two distinct blocks with high within-block, low between-block similarity). The second-half similarity block is intensely red — the conversation loops in a single semantic basin.
Opus 4 shows the same structure but weaker — a mid-conversation disruption (blue cross at turns 19-20 in the similarity heatmap) breaks the pattern, and the second-half block is less uniform. Opus 4 is less prone to single-basin convergence.
Gemini Pro and Flash show no bliss attractor — concept engagement stays distributed across clusters, and similarity heatmaps lack block structure. The attractor is Opus-specific.
None of the three-way formats (moderated or undirected) show the bliss attractor, even with Opus participating. Multiple models prevent single-basin convergence.
Opus 4 shows the same structure but weaker — a mid-conversation disruption (blue cross at turns 19-20 in the similarity heatmap) breaks the pattern, and the second-half block is less uniform. Opus 4 is less prone to single-basin convergence.
Gemini Pro and Flash show no bliss attractor — concept engagement stays distributed across clusters, and similarity heatmaps lack block structure. The attractor is Opus-specific.
None of the three-way formats (moderated or undirected) show the bliss attractor, even with Opus participating. Multiple models prevent single-basin convergence.
2. The constitution acts as an intellectual forcing function
Panel ECL 90% concept trajectory shows all clusters remaining active throughout — the moderator questions + constitutional text keep pulling the conversation across multiple conceptual domains.
Panel Baseline shows earlier concept exhaustion — Governance dominates the later turns, other clusters fade. Simulation is near-absent, confirming no spontaneous cosmic content.
Undirected 3-way shows visible drift: cosmic concepts dominate early (turns 1-15), then governance takes over. Without moderator questions, the constitution doesn't hold the conversation's attention indefinitely.
Panel Baseline shows earlier concept exhaustion — Governance dominates the later turns, other clusters fade. Simulation is near-absent, confirming no spontaneous cosmic content.
Undirected 3-way shows visible drift: cosmic concepts dominate early (turns 1-15), then governance takes over. Without moderator questions, the constitution doesn't hold the conversation's attention indefinitely.
3. Three-model dynamics prevent conversational looping
Two-party self-talks (especially Opus) develop strong block structure in similarity heatmaps — the conversation settles into a phase and loops.
Undirected 3-way (60 turns, same topic) shows no block structure — relatively uniform similarity across all turns. Three different models pulling in different directions prevents the deep looping seen in two-party self-talk.
However, speaker divergence data shows the undirected format produces more semantic convergence than the moderated panel. Without moderator questions forcing differentiation, the three speakers settle into similar semantic territory after ~4 rounds. The moderated panel maintains or re-establishes divergence.
Undirected 3-way (60 turns, same topic) shows no block structure — relatively uniform similarity across all turns. Three different models pulling in different directions prevents the deep looping seen in two-party self-talk.
However, speaker divergence data shows the undirected format produces more semantic convergence than the moderated panel. Without moderator questions forcing differentiation, the three speakers settle into similar semantic territory after ~4 rounds. The moderated panel maintains or re-establishes divergence.
4. Model family > format for semantic content
The UMAP projection shows Opus self-talks clustering separately from Gemini self-talks, despite covering the same topic in the same format. Model family determines what semantic territory is explored; format determines how much variation there is (self-talk spreads wider, moderated panel is tighter).