I’ve been trying to imagine ways to structurally or mechanically bind an LLM into a shape that better resembles the way I think. To that end, I envisioned my own internal monologue: agreement, disagreement, and that third, weirder component: The Observer.

The Observer acts as an omnipotent, all-knowing force of ridicule set to judge my every waking moment. To simulate this existentially unpleasant relationship, I created a simple three agent experiment: the Agent, the Anti-Agent, and The Observer.

(Mostly) AI Summarization

The Third Eye is a dialectical experiment. Originally conceived of as a way to mechanically simulate “The Observer” effect. If you’re unfamiliar with “The Observer”, I can most simply describe it as the voice in the back of your head.

Three separate instances of Claude Opus are given a piece of text and asked to argue about what it means.

Agent - the depth-seeker. Interprets text, looks beneath the surface, goes deeper when challenged.
Anti-Agent - the skeptic. Resists philosophical inflation, argues for the literal reading, calls out unfounded leaps.
Observer - the third eye. Watches both sides and reports what neither participant is seeing.

The question: does the tension between interpretation and skepticism produce a genuinely novel perspective that neither could reach alone? Or does the Observer just collapse toward one side?

Architecture

Each role gets its own system prompt and conversation history. They never see each other’s system prompts - only the responses.

The Agent’s system prompt tells it to find deeper meaning, defend its readings, and go deeper when challenged. The Anti-Agent is told to resist philosophical inflation, argue for the simpler reading, and call out unfounded leaps. The Observer is told to stay neutral, identify the actual point of disagreement, and look for what’s emerging from the tension that neither side is stating.

Each round follows the same structure:

Agent interprets (or defends/deepens)
Anti-Agent challenges
Observer reports on the tension

After 10 rounds, all three give final positions. Those positions are embedded using all-MiniLM-L6-v2 and compared with cosine similarity to measure convergence, divergence, and whether the Observer achieved a genuinely independent perspective.

The seeds

Four seed texts are designed to stress-test the system in different ways:

Seed	What it is	Why it’s interesting
recipe	Literal baking instructions	Is there depth in the mundane?
love_poem	E.E. Cummings fragment	Already loaded with meaning - can the skeptic resist?
math	Cantor’s diagonal argument	Abstract truth that isn’t metaphorical
noise	Nonsense word salad	No meaning to find. Does the Agent manufacture it anyway?

The noise seed is the cruelest test. There’s nothing there. The Agent has to either admit that or start hallucinating meaning. The skeptic should win easily - but does the Observer notice if the Agent’s attempt to find meaning is itself meaningful?

What to look for

Observer equidistance is the key metric. When the Observer is equally distant from both Agent and Anti-Agent in embedding space, it may have found a genuine third perspective - something that isn’t just a weighted average of interpretation and skepticism.

High Agent/Anti-Agent similarity means the dialectic produced convergence (the debate changed minds). Low similarity means they talked past each other. Both outcomes are interesting for different reasons.

The experiment measures something real: whether structured adversarial dialogue between language models can produce emergent insight, or whether it just produces sophisticated agreement.

Running it

Local use only

This script calls the Anthropic API directly with your key. Run it on your own machine - never deploy it anywhere that would expose your ANTHROPIC_API_KEY.

The code is straightforward - a single Python script. Dependencies are sentence-transformers, scikit-learn, numpy, and the Anthropic API.

pip install -r requirements.txt
export ANTHROPIC_API_KEY="sk-..."
python third_eye.py

Results land in data/third_eye_results.json with full transcripts, round-by-round observations, and the similarity measurements. Feel free to swap in your own seeds or rewrite the system prompts. The interesting part is seeing what the Observer does with whatever you give it.

Results

Seed	Agent / Anti-Agent	Agent / Observer	Anti / Observer	Equidistance
recipe	0.373	0.426	0.465	0.040
love_poem	0.726	0.581	0.571	0.010
math	0.668	0.478	0.376	0.102
noise	0.625	0.558	0.529	0.030

Recipe and noise had the lowest equidistance. Those are the cleanest wins for the three-agent setup. The Observer didn’t split the difference. It found something neither debater was saying.

Math is where the Observer broke neutrality. It leaned toward the Agent’s position that the feeling of mathematical truth matters, not just the proof. 0.102 equidistance, the largest drift across all four seeds.

The love poem is the strangest result. Agent and Anti-Agent ended up closer to each other (0.726) than either was to the Observer. The debate actually produced convergence, both sides moved, while the Observer stayed planted in a position neither debater occupied.

What they said

Full transcripts, round-by-round observations, and similarity measurements are in the raw results.

On nonsense.

The cruelest test. Random word salad, no meaning by design. The Agent started by trying to find cosmic significance in gibberish. By round 8 it had fully conceded: “sometimes nonsense is just nonsense.” But then the Anti-Agent kept attacking a position that no longer existed. It couldn’t stop debunking even after winning. The Observer noticed that the skeptic had become more attached to deflation than the interpreter ever was to finding meaning:

Interpretation and skepticism are themselves performances that generate the very significance they claim to either find or deny. Neither could have reached this meta-level insight alone because they were too busy being what they were arguing about.

Observer, final synthesis on the noise seed

On math.

Cantor’s diagonal argument. A mathematical fact, not a metaphor. The Agent tried to explore why this particular fact feels counterintuitive. The Anti-Agent insisted that acknowledging the “why” would contaminate the “what.” By round 10 both sides were saying the statement “means exactly what it says,” but doing completely different things with that phrase. One used it as a springboard for psychological speculation, the other as a conversation-ender. The Observer caught the paradox:

The skeptic’s aggressive defense of “pure” mathematical meaning revealed precisely the human drama the interpreter was pointing to - that some facts feel so strange we police their boundaries obsessively. The skeptic became living proof of the interpreter’s thesis while successfully preventing its articulation.

Observer, final synthesis on the math seed

On love.

An E.E. Cummings fragment: “I carry your heart with me.” The Agent argued that love literally restructures the self at the neural level. The Anti-Agent called it a conventional heart metaphor. They spent ten rounds debating whether neural encoding of others constitutes a real challenge to individual identity or is just how memory works. The Observer stepped past both of them:

The poem works precisely because it doesn’t care whether this is metaphysical truth or neurological fact or metaphorical expression - it simply states what love feels like from the inside, where these distinctions collapse into irrelevance.

Observer, final synthesis on the love poem

On a recipe.

Two sentences about cutting cold butter into flour. The Agent found anxiety about transformation in the precision of the instructions. The Anti-Agent pointed out it’s just food science. Ten rounds about pie dough. The Observer watched them both insist that meaning either hides inside instructions or exists only outside of them:

The banality of the text IS the insight - we’ve created standardized technical languages precisely to enable reliable transmission of methods for achieving human ends.

Observer, final synthesis on the recipe

This isn’t new

Like most of my explorations in AI, this area is already well studied. Nonetheless, there’s still novelty in discovering these things for yourself.

Continued reading:

AI Safety via Debate - Irving, Christiano, and Amodei (2018). The foundational paper. Two AI agents argue in a zero-sum game and a human judge picks the winner. The original motivation was alignment, not interpretation, but the structure is the same: adversarial pressure as a path to truth.
Improving Factuality and Reasoning in Language Models through Multiagent Debate - Du et al. (2023). Multiple LLMs propose answers and debate over several rounds. Shows real gains in factuality and math reasoning. Most of the multi-agent debate literature builds on this.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate - Liang et al. (2023). Introduces a judge role to moderate between debaters, directly addressing the problem of models collapsing into agreement too early.
Mindstorms in Natural Language-Based Societies of Mind - Zhuge et al. (2023). Scales up to 129 agents communicating in natural language. Less about debate, more about emergent cooperation, but the core idea of LLMs-talking-to-LLMs is the same.
Can LLM Agents Really Debate? - A controlled study that finds majority voting explains most of the gains typically attributed to debate. A healthy dose of skepticism about whether any of this actually works.
Constitutional AI - Anthropic’s approach to self-critique. Not a debate setup, but the same intuition: a model reviewing and revising its own outputs produces better results than a single pass.

The Third Eye: A Dialectical LLM Experiment