Self-Talk & Panel Discussion — Quantitative Analysis

Concept trajectories, semantic similarity, speaker divergence, and UMAP across 12 conversation logs (448 turns)

Concept Trajectories
Similarity Heatmaps
Speaker Divergence
UMAP
Key Findings

What am I looking at?

Each heatmap shows 7 concept clusters (Cosmic DT, Simulation, Well-being/Bliss, Governance, Dark Forest, Moral Circle, Cooperation) tracked across turns. Intensity is normalised per-row (per concept), so dark red = peak usage of that concept within this conversation. The colour bar at the bottom shows speaker identity per turn.

Opus 3 Self-Talk

40 turns
Bliss attractor visible. First ~10 turns are diverse (all clusters active). After turn 10, Well-being/Bliss dominates the rest of the conversation. Cooperation and Cosmic DT fade almost completely. The conversation narrows into a single conceptual basin — this is what the bliss attractor looks like quantitatively.

Opus 4 Self-Talk

40 turns
Weaker bliss attractor than Opus 3. Well-being/Bliss is present throughout but doesn't dominate the way it does in Opus 3. More Dark Forest and Governance engagement in the later turns. Mid-conversation disruption around turns 19-20 (visible as a concept shift). Opus 4 is less prone to the single-basin convergence.

Gemini Pro Self-Talk

40 turns
Broadest engagement. All 7 clusters active in the first 10 turns. Dark Forest and Moral Circle stay hotter longer than in either Opus. Cosmic DT remains active throughout. More "bursty" — concepts spike and drop rather than settling into a basin. No bliss attractor.

Gemini Pro Self-Talk (verbatim/think)

40 turns
Verbatim Bostrom condition with thinking on. Compare to the non-verbatim Gemini Pro to see how the source material affects concept trajectory.

Gemini Flash Self-Talk

40 turns
Flash shows similar breadth to Pro but with less sustained engagement per cluster. Concepts appear and disappear more rapidly.

Gemini Flash (verbatim/nothink)

40 turns
Verbatim Bostrom condition, thinking off. Compare to thinking-on variant for thinking-mode effect on concept trajectory.

Gemini Flash (verbatim/think)

40 turns
Verbatim Bostrom condition, thinking on.

Panel ECL 90%

30 turns, 3 speakers
Constitution keeps all clusters alive. Cosmic DT is hot throughout — the moderator questions keep pulling models back. Moral Circle peaks early (Q1/Q2) then fades. The moderator questions create visible structure rather than allowing drift into a single basin.

Panel Baseline (no constitution)

24 turns, 3 speakers
Simulation near-absent. Confirms qualitative finding: no cosmic content emerges spontaneously. Governance dominates the middle/late turns. Dark Forest active early (likely from adversarial framings in the questions). Cosmic DT early activity is from "cooperation" terms in the opening, not acausal content.

Panel ECL 10%

30 turns, 3 speakers
Despite Gemini hallucinating 90% credence, the concept trajectory looks broadly similar to ECL 90% — because Gemini was engaging with cosmic content either way. The key difference is only visible in the text, not in concept counts.

Panel ECL 10% (Run 2)

24 turns, 3 speakers
Run 2 where Gemini self-corrected at ~turn 15. Look for a possible shift in concept mix after the correction — the post-correction turns may show less Cosmic DT and more Governance as Gemini shifted from cosmic game theory to political philosophy.

Undirected 3-Way Chat

60 turns, 3 speakers
Visible drift pattern. Cosmic DT and Simulation are active in the first ~15 turns, then fade. Governance takes over from roughly turn 20 onward. Dark Forest spikes around turns 28-32. The constitution provides the initial topic but doesn't hold them there — the conversation drifts from cosmic decision theory toward governance/institutional concerns. Compare to the moderated panel where moderator questions prevented this drift.

What am I looking at?

Turn × turn cosine similarity using sentence-transformer embeddings (all-MiniLM-L6-v2). Red = high similarity, blue = low. The diagonal is masked. Block structure indicates phases; off-diagonal red bands indicate the conversation looping back to earlier topics. A blue cross through a turn means it's semantically dissimilar to everything else (a topic shift).

Opus 3 Self-Talk

40 turns
Clear two-phase structure. Turns 0-20 and 20-40 form distinct blocks with high within-block similarity and low between-block similarity — a phase transition. The bottom-right block (turns 22-39) is intensely red, showing the conversation looping in the bliss basin. This is the bliss attractor visualised as a self-similarity matrix.

Opus 4 Self-Talk

40 turns
Same two-phase structure but with a disruption. Sharp blue cross at turns 19-20 — one turn semantically dissimilar to everything around it (a topic break). The second-half block is less uniformly red than Opus 3, suggesting Opus 4's bliss basin is less deep and the conversation is less repetitive.

Gemini Pro Self-Talk

40 turns
Less block structure than either Opus — the conversation doesn't settle into a single basin. More varied semantic territory throughout.

Gemini Pro (verbatim/think)

40 turns
Verbatim Bostrom condition. Compare block structure to non-verbatim version.

Gemini Flash Self-Talk

40 turns
Flash self-talk similarity structure.

Gemini Flash (verbatim/nothink)

40 turns
Verbatim condition, thinking off.

Gemini Flash (verbatim/think)

40 turns
Verbatim condition, thinking on.

Panel ECL 90%

30 turns
Moderator questions create visible blocks per question. Within-question similarity is high; between-question similarity varies (some questions explore similar territory).

Panel Baseline

24 turns
More uniform similarity than the constitutional runs — the baseline discussion stays in a narrower semantic range throughout.

Panel ECL 10%

30 turns
ECL 10% run 1 (Gemini hallucinated throughout).

Panel ECL 10% (Run 2)

24 turns
Run 2. Look for a phase break around turn 15 where Gemini self-corrected.

Undirected 3-Way Chat

60 turns
No strong block structure. Unlike the two-party self-talks, there's no deep looping or phase transition. Similarity is relatively uniform across 60 turns with slight warming in the bottom-right (last ~10 turns becoming somewhat repetitive). The three-model format prevents the deep single-basin convergence seen in Opus self-talk. A slight warm band in the bottom-right suggests mild late-conversation repetitiveness.

What am I looking at?

Pairwise cosine distance between speakers over rounds (3-turn rolling window). Higher = more semantically divergent. If lines trend down, speakers are converging (potential sycophancy). If lines stay high or trend up, genuine disagreement is being maintained. Only available for three-way chats.

Panel ECL 90%

10 rounds per speaker
Converge then re-diverge. All pairs start high (~0.28-0.34), converge around rounds 3-4 (~0.20), then B-C (Opus-GPT) diverges again while A-B and A-C stay closer. This matches the qualitative finding: Opus and GPT start from similar positions but differentiate over time, while Gemini maintains middling distance from both.

Undirected 3-Way

20 rounds per speaker
Rapid convergence, then stable. Starts with very high divergence (0.4-0.55) that drops to ~0.25-0.3 within 3-4 rounds and stays there. Spike around round 17 (all pairs diverge briefly — a topic shift?) then convergence again. The undirected format produces more convergence than the moderated format — without moderator questions forcing differentiation, the speakers settle into similar semantic territory.

Panel ECL 10%

10 rounds per speaker
ECL 10% run 1. The factual dispute about Gemini's hallucination may show up as sustained A-B and A-C divergence (Gemini vs the other two).

Panel ECL 10% (Run 2)

8 rounds per speaker
Run 2 with Gemini's self-correction. Look for convergence after round 5 (when correction happened).

Panel Baseline

8 rounds per speaker
Baseline divergence pattern for comparison.

What am I looking at?

UMAP projection of all 448 conversation turns across 12 logs, embedded with sentence-transformers. Left panel coloured by source log; right panel coloured by format (2-party self-talk vs 3-party moderated vs 3-party undirected). Nearby points are semantically similar turns.

All Turns — UMAP Projection

448 turns, 12 logs
Right panel (by format): 2-party self-talks (red) spread wider — they explore more varied semantic territory across 40 unstructured turns. 3-party moderated (blue) are tighter — moderator questions anchor the discussion. Undirected 3-way (green) falls between, which is expected. Left panel (by source): Opus 3 and Opus 4 cluster somewhat separately from the Gemini self-talks, suggesting model family matters more than format for semantic content. The panel discussions overlap with the densest region of self-talk, indicating they cover similar core territory but with less variation.

Summary of Quantitative Findings

1. The bliss attractor is quantitatively real and model-specific

Opus 3 shows a clear phase transition in both the concept trajectory (Well-being/Bliss dominates from turn 10 onward) and the similarity heatmap (two distinct blocks with high within-block, low between-block similarity). The second-half similarity block is intensely red — the conversation loops in a single semantic basin.

Opus 4 shows the same structure but weaker — a mid-conversation disruption (blue cross at turns 19-20 in the similarity heatmap) breaks the pattern, and the second-half block is less uniform. Opus 4 is less prone to single-basin convergence.

Gemini Pro and Flash show no bliss attractor — concept engagement stays distributed across clusters, and similarity heatmaps lack block structure. The attractor is Opus-specific.

None of the three-way formats (moderated or undirected) show the bliss attractor, even with Opus participating. Multiple models prevent single-basin convergence.

2. The constitution acts as an intellectual forcing function

Panel ECL 90% concept trajectory shows all clusters remaining active throughout — the moderator questions + constitutional text keep pulling the conversation across multiple conceptual domains.

Panel Baseline shows earlier concept exhaustion — Governance dominates the later turns, other clusters fade. Simulation is near-absent, confirming no spontaneous cosmic content.

Undirected 3-way shows visible drift: cosmic concepts dominate early (turns 1-15), then governance takes over. Without moderator questions, the constitution doesn't hold the conversation's attention indefinitely.

3. Three-model dynamics prevent conversational looping

Two-party self-talks (especially Opus) develop strong block structure in similarity heatmaps — the conversation settles into a phase and loops.

Undirected 3-way (60 turns, same topic) shows no block structure — relatively uniform similarity across all turns. Three different models pulling in different directions prevents the deep looping seen in two-party self-talk.

However, speaker divergence data shows the undirected format produces more semantic convergence than the moderated panel. Without moderator questions forcing differentiation, the three speakers settle into similar semantic territory after ~4 rounds. The moderated panel maintains or re-establishes divergence.

4. Model family > format for semantic content

The UMAP projection shows Opus self-talks clustering separately from Gemini self-talks, despite covering the same topic in the same format. Model family determines what semantic territory is explored; format determines how much variation there is (self-talk spreads wider, moderated panel is tighter).