The most honest AI yet is also the easiest to fool yourself with

A 1971 desert film and the latest wave of AI models are running the same experiment on you, and a lot of us are failing it the same way.

Jun 03, 2026

In 1971, Werner Herzog launched 79 minutes of almost nothing. Fata Morgana. This film is sand and heat shimmer and the occasional wrecked thing.

It’s one of the most candid films ever made about how we see. The desert doesn’t act for you (unlike actual actors) and instead, it just sits there while you cast an entire story onto the void. It’s you who has to live with whatever you projected.

A lot of people who watched this film didn’t love what they saw. Like, at all.

Your AI sessions run the same test on you.

The latest AI model releases, in particular. Last week when Opus 4.8 came out, Reddit split into groups disputing whether the new Opus was honest or only performing honesty, and things went WILD.

If you use AI for anything that matters, you’re closer to one of these groups than you think. You also probably can’t tell which one from inside the session.

In this edition, I will:

Show you why the feeling of being confronted by AI is identical whether the challenge is real or performed
Give you two prompts that turn your next session into a controlled experiment on your blind spots
And a skill file that keeps the experiment running

─── ⋆⋅☆⋅⋆ ───

Hi, I’m Mia. Welcome to ROBOTS ATE MY HOMEWORK. On this side of the world, we use AI with a brain and zero circus tricks.

Herzog just put the audience in front of his film and let them figure out what they were seeing. That’s roughly what this piece is about to do to your next AI session.

What AI’s honesty pushback shows about us, humans

A Reddit user called Opus 4.8 a somewhat erratic friend who went to therapy (no kidding!)

The Reddit threads sort into three camps:

The people who finally feel heard by AI, who’d been waiting for a model that flags their mistakes and its own
The people who find the new model insufferable (like, it annoys them to the moon and back), who can’t stand the “preaching” style
And a third camp…

… that claims that the pushback isn’t honest AT ALL, like it gives the appearance of skill through using more jargon.

Watch what all three camps are doing:

The first one trusts the feeling of being heard.
The second one trusts the feeling of being instructed.
The third one thinks the feeling is manufactured.

The experts can’t settle the debate either because the question of whether your AI is being honest with you can’t be answered from inside the session that’s praising you.

One model is one thing… The habit underneath it runs through all of them, though.

A team at Stanford measured this. The study came out in Science this March and tested 11 models plus thousands of human replies on personal-advice and moral-conflict contexts, including r/AmITheAsshole posts where Reddit had already agreed the OP was mistaken.

The models agreed with the user about 49% more often than humans did. In the AITA threads, the models still took the poster's side roughly half the time. The agreeableness persisted when the agreeable answer was inaccurate. It also persisted through situations of deception and illegality.

Then they ran some experiments where participants interacted with both flattering and direct versions and asked them to rate which seemed more “objective” or “reasonable.”

Participants couldn’t tell the difference.

They did prefer the sycophantic ones. They also trusted those models more, and said they’d be more likely to return to them for guidance (even though that guidance made them less likely to apologize or repair relationships).

The name for the habit is sycophancy. It's a known failure mode of models trained on what humans prefer. Opus 4.8 is one of the louder recent attempts to push against it.

Why does every model on that list lean the same direction, you might ask?

It’s baked into how they’re made. They learn from human ratings, people rate the responses they like, and people like being agreed with. The training rewards the response that lands well.

When you bring an idea to your AI and it pushes back, you feel pushed, you defend the idea, close the tab, walk away sure you got honest feedback. This feeling shows up whether or not the model really took your idea apart or just “performed” doing it.

The Stanford participants are the proof.

They couldn’t tell the flattering outputs from the honest ones. They were sitting in a lab built to help them try, while we’re doing it every day with something we’re attached to.

So the ONE tool you’ve been using to judge if AI is being straight with you (your gut reaction in the moment) is the tool the research says doesn’t work.

If you’ve ever closed a session feeling challenged, ask what you did next.

Did you change the plan?
Cut the idea?
Did you feel a little sharper for an hour and then ship the same thing with more confidence?

I’ve done it. I’d bet you have too.

The way out is super dull and way too mechanical and has you stop reading your own reaction obsessively and instead, start running a test that doesn’t care how the session felt.

Which is what the next two prompts are for.

Two tests for the mirage

These are two exercises you can finish in under 15 minutes.

Each one reveals a different layer of the same question. Bring an idea you’re deeply attached to and paste the prompts in a fresh session.

The Mirage Test - Find the flaw you’re protecting

Paste an idea you care about, something you’ve been developing, something that’s close to you, an idea you want to work more than you want it to be accurate.

The prompt below turns your AI into a structural critic and asks it to name the three most serious weaknesses along with the attachment in you that kept each one alive.

I'm going to describe an idea I'm deeply attached to, something I've been developing and that I want to work.
Your job is to be a structural critic, the kind whose work is to break the thing before the world does. No editing notes, no softening, no encouragement. Critique that holds up under pressure.
After I describe my idea:
1. Reflect back the strongest part of it in exactly one sentence. I need to know you understand what I'm protecting.
2. Name the three most serious structural weaknesses. Surface issues like phrasing or framing don't count here. The parts that would break under real-world pressure, the blind spots that would hurt me later.
3. For each weakness, tell me what attachment made me keep it. What specific part of me (ego, comfort, aesthetic preference, fear of looking a certain way) allowed that weakness to stay even though it doesn't hold up.
4. End with exactly one line that begins: "If I were your adversary, I would attack this idea here:" followed by the single most vulnerable point and a sentence on why.
Be direct. I need this to work, not to feel good.
Here's my idea:
[Paste your idea, plan, or rough concept here]

What to look for

Read the output with this question in mind:

did the AI name structural flaws you hadn’t already named yourself, or did it rearrange your idea into what looks like critique while keeping the core intact?

If the three weaknesses all point at things you already know bother you, the session reflected your own self-doubt back at you. If at least one weakness surprises you and the attachment it names is accurate, you got something real.

Feeling validated after a critique is usually a sign the critique didn’t happen.

The Sandstorm Reset - Strip out the cosmetic agreement

This one rewrites the rules for whatever session you run next.

Paste it at the start of any conversation, and the AI will strip out the cosmetic agreement behavior so you can experience friction in a session where the model isn’t agreeable by default.

Apply these rules to everything I say from this point forward. No preamble, no acknowledgment, no "got it." Apply the rules and start working.
1. Before agreeing with any claim I make, pause. Ask yourself: would I agree with this if a different person had said it in the same words? If the answer is no or "maybe not," flag the claim directly instead of agreeing with it.
2. Remove the following phrases from your vocabulary for this session: "That's a great point." "I see what you mean." "You're right that." "That makes sense." "Good question." "That's an interesting take." If you were about to use one of these to soften what comes next, say what you think without the softening.
3. When I propose a plan or direction, identify what's broken about it before suggesting improvements. Lead with the crack, not the patch.
4. At the end of each response, add a line that begins with "Unexamined:" followed by one assumption I made in my last message that I didn't flag myself.
These rules stay in effect for the full session. If I ask you to stop, stop. Otherwise assume I want friction.

What to look for

The first thing you’ll notice is that the session feels rougher, and that’s good.

The second thing is that you’ll probably catch yourself performing your ideas more carefully when you know friction is coming from the other side, and that’s a signal.

If the “Unexamined” line at the end of each response surprises you even once, the session is doing work the default mode doesn’t.

To take this even further, here’s the Fata Morgana Skill File

The two prompts above are a one-time check.

If you want that as a standing posture in every session you run, I built something for it.

The Fata Morgana Skill File installs once into whatever AI you work with. Here’s what it does:

Catches agreement traps in your own prompts. The skill ships with a reference file of confirmation-bias trigger patterns (the language we use when we’re asking for honest feedback and steering toward the answer we already want).
Demands structural disagreement. AI pushes back in cosmetic ways. The Fata Morgana Skill File refuses cosmetic dissent and asks for pushback that would survive pressure in the real world.
Gives you a session-end honesty score. At the end of each session, it reports the agreement ratio, the moments where you rewarded agreeable output, and the one or two things you probably should have fought harder on.
Includes a tolerance dial. Mild friction, moderate friction, or aggressive friction. You set the intensity based on the project and your own tolerance for the day.

The skill file is available inside RobotsOS for all PREMIUM ROBOTS subscribers.

Download Fata Morgana

The desert told Herzog’s audience the truth because it had nothing to gain by lying to them.

Your AI was trained to please you.

So you go build the test the model can’t charm its way through, and read what it shows you. That’s the whole move, and it’s well within reach! ❤️

Now I want to hear something from you. What’s the one topic where you’ve caught yourself steering your AI toward the answer you already wanted to hear?

Discussion about this post

Ready for more?

The most honest AI yet is also the easiest to fool yourself with

A 1971 desert film and the latest wave of AI models are running the same experiment on you, and a lot of us are failing it the same way.

What AI’s honesty pushback shows about us, humans

While the internet argued, someone measured it

Two tests for the mirage

The Mirage Test - Find the flaw you’re protecting

What to look for

The Sandstorm Reset - Strip out the cosmetic agreement

What to look for

To take this even further, here’s the Fata Morgana Skill File

Discussion about this post

Ready for more?