A failed universal language explains why you keep picking the wrong AI output

The case for working with the differences between AI models instead of hunting for the single best output, told through two old stories about language.

Jun 24, 2026

I read an article in Harper’s this month about the history of Esperanto, a language built with the hope of ending all language problems.

Ludwik Zamenhof built it in a Warsaw apartment in the 1880s and was convinced that collaboration and peace and progress would inevitably follow if everyone would speak the same tongue.

The end of miscommunication, essentially.

Illustrations by Daniel Barreto — Love Language - The Undying Dream of Esperanto by Katie Thornton - https://harpers.org/archive/2026/06/love-language-katie-thornton-esperanto/

Here we are - it obviously didn’t work. He, like many of us, was so certain that the different languages were the obstacle, that the friction between them was what kept humanity from building together.

The Tower of Babel ran on the same assumption: one language, one tower, one shared ambition, then the scattering and languages fragmenting. The standard reading treats diversity of tongues as punishment, the cost of reaching too high.

Before Babel, everyone thought in the same structures, the same assumptions about what really mattered. After Babel, languages diverged and, with them, everything else.

Different grammars encoded different logics, different cultures built entirely different ways of seeing, and the scattering ended up being the engine of human civilization.

Somewhere in the middle of that Harper’s piece and re-reading the story of Babel, I got brought back to how we work with AI and how we treat different models as different languages.

And what happens when we stop picking the best output and instead, start working with the differences between them has changed how I create and build almost everything.

─── ⋆⋅☆⋅⋆ ───

Hi, I’m Mia. Welcome to ROBOTS ATE MY HOMEWORK. On this side of the world, we use AI with a brain and zero circus tricks.

Zamenhof thought one language would end all problems. Turns out the scattering was the point.

The Esperanto instinct and why it costs you your best ideas

We all have a go-to AI model, but sometimes, we try a second one to compare outputs. We read both outputs and pick whichever sounds more appealing, more like what we wanted, and then close the tab.

This is the Esperanto instinct, the assumption that the job is finding which model gets closest to the one correct version of our original ideas.

Languages don’t really work that way, though, as languages don’t just give you different words for the same thought. They give you different paths of least resistance for organizing thought. Some connections form easily, and others you have to push against the grammar to make.

Two people describing the same event in different languages will emphasize and speak about different details, foregrounding different elements of what happened. They CAN say the same thing, but the paths of least resistance sometimes pull them toward other versions of the story.

AI models work the same way.

I refuse to be the person who tells you Claude is “the careful one” and GPT is “the confident one” and Grok is whatever label the internet is using this week. I do hate labeling them the same way I hate labeling people (it’s reductive and it stops you from paying attention).

You’ve probably worked with enough models yourself to know what I mean. You know how one reaches for structure vs another for narrative, or how one frames a brief around risk and another around opportunity.

Those are different “grammars” for thinking about the same problem - your problem, and every single time you pick the output that sounds most like what you already had in mind, you’re choosing one grammar and discarding the rest.

The AI output versions you’re discarding are creative material.

Each model that interpreted your input differently was showing you a version of your idea you hadn’t considered, from an angle you wouldn’t have found by iterating inside the model you already trust.

Those varied interpretations are new thinking and working with them deliberately (instead of evaluating them competitively) changes what you can build today.

Thinking in more than one language at once

Case in point: I was working on a content piece on how people talk about AI in cultural commentary, analyzing different magazines and discourses. I ran my notes (2300 words of them) through three models: Claude Opus 4.8, GPT 5.5 and Qwen 3.7 Plus.

Claude assumed I was doing heavy media criticism and gave me a summary of talking points, like who was saying what, which publications were pushing which narrative. It had the shape and form of a PR report and was the most verbose and fluffy.

GPT assumed I was doing cultural analysis, so it made connections to broader patterns about how we talk about new, emerging technology and what anxieties show up when the dynamic changes.

Qwen assumed I was building a unique position and kept asking me what I was really arguing for (full disclosure, I use Qwen mainly through my Hermes agent, so it has the most context about my work, hence the endless question about my real goals).

I wasn’t sure myself either, to be honest. I was in “exploration” mode.

I’ve since run this test on quite a number of projects, and I realized that the models don’t disagree on the facts, but on what the facts are about.

When you run the same idea through different AI models or custom setups, you’ll sometimes get different interpretations of what your idea is about. That’s because each LLM reads your input through its own patterns. The differences between those readings uncover certain angles (some more surprising than others) of your problem.

Today, “use multiple models” is the standard advice, advice that’s been propagating for years and it usually means “compare outputs and pick the best one”.

Could we benefit from reframing this from picking the best one to working with all of them instead?

Yes we can. Everything interesting about multiplicity happens when you start deeply reading into the outputs rather than comparing outputs.

If you do this, you’ll start noticing something in the differences between them.

Each LLM makes different assumptions about what your idea is about. One model might assume you’re making a strategic argument, another that you’re telling a really nice story, a third might assume that what you want is to build a workflow.

I built a framework for different AI thinking modes in here:

Three plays, three AI personalities, and I finally got surprised again

Mia Kiraki 🎭

Apr 22

Read full story

Your input said NOTHING about any of that. Each model filled in the blanks with its own “grammar” and the differences between them showed you the different shapes of your idea.

Here’s how to run this yourself.

Quick exercise.

Pick a project you’re working on right now, something you’d normally just run through your go-to model and iterate until it feels polished enough to ship and use.

Write a prompt that describes what you want and use the same input with every model, word for word. Don’t adjust for each model’s strengths, or what you think those strengths are. The point is that identical input leads to different interpretations.

Run it through three different models and read the outputs together.

Here’s what to look for:

What assumption did each model make about what your idea is really about. Were any of those assumptions closer to what you meant than your own original framing?
Who was each model talking to? Did the audience differences surprise you?
What did one model foreground that the others treated as invisible? Was that emphasis useful?
Did any version show you a way into your own idea that you preferred over what you started with?

Just for once, don’t pick the best output. Read what the differences between them tell you about the idea you’re building.

What to do with the differences:

Use them as decision points. If one model assumed you’re writing for experts and another for beginners, that shows you haven’t really decided about your audience. That’s important information.

Use them as a diagnostic. If one model showed risk and another opportunity, then you were pretty ambiguous in your input. You weren’t clear enough. That’s when you know your thinking is still unresolved and you still have to decide what direction to go in.

Use them as a map of the problem space. Each model created a different map - strategy, narrative, framework, critique, whatever. The differences between these territories show you the full shape of what your idea is. You might’ve been working inside one territory this whole time, and now you can see the others and decide whether to stay or move.

Use them to find load-bearing questions. Sometimes the models will each grab different versions of what you try to do, and you’ll realize you haven’t really decided what you want to do yet. That’s your discovery phase in plain sight and it’s time you figure out what you’re asking.

Use them to write a better prompt. Once you see what each of the models assumed, you can go back and get sharper about what you really meant. Not “pick the best output” but “now I know what I was trying to say”.

The FOUR WINDS agent in RobotsOS does exactly this: it pressure-tests your work from four different angles and shows you where they collide.

Full disclaimer that I’m not claiming and will never agree with the discourse that AI gives you the answer. No - it just sometimes can be pretty effective at showing you what the question was from the start.

The Tower of Babel was abandoned and languages split and from that scattering, humanity got more ways to think, more grammars for organizing reality, more ways to describe the same thing and mean something entirely different by it.

Our AI models are those languages, and they think differently from each other. Those differences are worth working with, rather than completely resolving.

Tomorrow, take something you’re working on right now and run it through the above test.

Notice how each version assumes a different audience, a different priority, a different version of what your idea is. Read what the differences between them tell you about the thing you’re creating.

Learn to think in more than one language at a time.

What’s one idea you’re working on right now that would benefit from running it through different AI languages?

Three plays, three AI personalities, and I finally got surprised again

Discussion about this post

Ready for more?