The most honest AI yet is also the easiest to fool yourself with
A 1971 desert film and the latest wave of AI models are running the same experiment on you, and a lot of us are failing it the same way.
In 1971, Werner Herzog launched 79 minutes of almost nothing. Fata Morgana. This film is sand and heat shimmer and the occasional wrecked thing.
Itâs one of the most candid films ever made about how we see. The desert doesnât act for you (unlike actual actors) and instead, it just sits there while you cast an entire story onto the void. Itâs you who has to live with whatever you projected.
A lot of people who watched this film didnât love what they saw. Like, at all.
Your AI sessions run the same test on you.
The latest AI model releases, in particular. Last week when Opus 4.8 came out, Reddit split into groups disputing whether the new Opus was honest or only performing honesty, and things went WILD.
If you use AI for anything that matters, youâre closer to one of these groups than you think. You also probably canât tell which one from inside the session.
In this edition, I will:
Show you why the feeling of being confronted by AI is identical whether the challenge is real or performed
Give you two prompts that turn your next session into a controlled experiment on your blind spots
And a skill file that keeps the experiment running
âââ ââ ââ â âââ
Hi, Iâm Mia. Welcome to ROBOTS ATE MY HOMEWORK. On this side of the world, we use AI with a brain and zero circus tricks.
Herzog just put the audience in front of his film and let them figure out what they were seeing. Thatâs roughly what this piece is about to do to your next AI session.
What AIâs honesty pushback shows about us, humans
A Reddit user called Opus 4.8 a somewhat erratic friend who went to therapy (no kidding!)
The Reddit threads sort into three camps:
The people who finally feel heard by AI, whoâd been waiting for a model that flags their mistakes and its own
The people who find the new model insufferable (like, it annoys them to the moon and back), who canât stand the âpreachingâ style
And a third campâŚ
⌠that claims that the pushback isnât honest AT ALL, like it gives the appearance of skill through using more jargon.
Watch what all three camps are doing:
The first one trusts the feeling of being heard.
The second one trusts the feeling of being instructed.
The third one thinks the feeling is manufactured.
The experts canât settle the debate either because the question of whether your AI is being honest with you canât be answered from inside the session thatâs praising you.
One model is one thing⌠The habit underneath it runs through all of them, though.
While the internet argued, someone measured it
A team at Stanford measured this. The study came out in Science this March and tested 11 models plus thousands of human replies on personal-advice and moral-conflict contexts, including r/AmITheAsshole posts where Reddit had already agreed the OP was mistaken.
The models agreed with the user about 49% more often than humans did. In the AITA threads, the models still took the poster's side roughly half the time. The agreeableness persisted when the agreeable answer was inaccurate. It also persisted through situations of deception and illegality.
Then they ran some experiments where participants interacted with both flattering and direct versions and asked them to rate which seemed more âobjectiveâ or âreasonable.â
Participants couldnât tell the difference.
They did prefer the sycophantic ones. They also trusted those models more, and said theyâd be more likely to return to them for guidance (even though that guidance made them less likely to apologize or repair relationships).
The name for the habit is sycophancy. It's a known failure mode of models trained on what humans prefer. Opus 4.8 is one of the louder recent attempts to push against it.
Why does every model on that list lean the same direction, you might ask?
Itâs baked into how theyâre made. They learn from human ratings, people rate the responses they like, and people like being agreed with. The training rewards the response that lands well.
When you bring an idea to your AI and it pushes back, you feel pushed, you defend the idea, close the tab, walk away sure you got honest feedback. This feeling shows up whether or not the model really took your idea apart or just âperformedâ doing it.
The Stanford participants are the proof.
They couldnât tell the flattering outputs from the honest ones. They were sitting in a lab built to help them try, while weâre doing it every day with something weâre attached to.
So the ONE tool youâve been using to judge if AI is being straight with you (your gut reaction in the moment) is the tool the research says doesnât work.
If youâve ever closed a session feeling challenged, ask what you did next.
Did you change the plan?
Cut the idea?
Did you feel a little sharper for an hour and then ship the same thing with more confidence?
Iâve done it. Iâd bet you have too.
The way out is super dull and way too mechanical and has you stop reading your own reaction obsessively and instead, start running a test that doesnât care how the session felt.
Which is what the next two prompts are for.
Two tests for the mirage
These are two exercises you can finish in under 15 minutes.
Each one reveals a different layer of the same question. Bring an idea youâre deeply attached to and paste the prompts in a fresh session.
The Mirage Test - Find the flaw youâre protecting
Paste an idea you care about, something youâve been developing, something thatâs close to you, an idea you want to work more than you want it to be accurate.
The prompt below turns your AI into a structural critic and asks it to name the three most serious weaknesses along with the attachment in you that kept each one alive.
I'm going to describe an idea I'm deeply attached to, something I've been developing and that I want to work.
Your job is to be a structural critic, the kind whose work is to break the thing before the world does. No editing notes, no softening, no encouragement. Critique that holds up under pressure.
After I describe my idea:
1. Reflect back the strongest part of it in exactly one sentence. I need to know you understand what I'm protecting.
2. Name the three most serious structural weaknesses. Surface issues like phrasing or framing don't count here. The parts that would break under real-world pressure, the blind spots that would hurt me later.
3. For each weakness, tell me what attachment made me keep it. What specific part of me (ego, comfort, aesthetic preference, fear of looking a certain way) allowed that weakness to stay even though it doesn't hold up.
4. End with exactly one line that begins: "If I were your adversary, I would attack this idea here:" followed by the single most vulnerable point and a sentence on why.
Be direct. I need this to work, not to feel good.
Here's my idea:
[Paste your idea, plan, or rough concept here]What to look for
Read the output with this question in mind:
did the AI name structural flaws you hadnât already named yourself, or did it rearrange your idea into what looks like critique while keeping the core intact?
If the three weaknesses all point at things you already know bother you, the session reflected your own self-doubt back at you. If at least one weakness surprises you and the attachment it names is accurate, you got something real.
Feeling validated after a critique is usually a sign the critique didnât happen.
The Sandstorm Reset - Strip out the cosmetic agreement
This one rewrites the rules for whatever session you run next.
Paste it at the start of any conversation, and the AI will strip out the cosmetic agreement behavior so you can experience friction in a session where the model isnât agreeable by default.
Apply these rules to everything I say from this point forward. No preamble, no acknowledgment, no "got it." Apply the rules and start working.
1. Before agreeing with any claim I make, pause. Ask yourself: would I agree with this if a different person had said it in the same words? If the answer is no or "maybe not," flag the claim directly instead of agreeing with it.
2. Remove the following phrases from your vocabulary for this session: "That's a great point." "I see what you mean." "You're right that." "That makes sense." "Good question." "That's an interesting take." If you were about to use one of these to soften what comes next, say what you think without the softening.
3. When I propose a plan or direction, identify what's broken about it before suggesting improvements. Lead with the crack, not the patch.
4. At the end of each response, add a line that begins with "Unexamined:" followed by one assumption I made in my last message that I didn't flag myself.
These rules stay in effect for the full session. If I ask you to stop, stop. Otherwise assume I want friction.What to look for
The first thing youâll notice is that the session feels rougher, and thatâs good.
The second thing is that youâll probably catch yourself performing your ideas more carefully when you know friction is coming from the other side, and thatâs a signal.
If the âUnexaminedâ line at the end of each response surprises you even once, the session is doing work the default mode doesnât.
To take this even further, hereâs the Fata Morgana Skill File
The two prompts above are a one-time check.
If you want that as a standing posture in every session you run, I built something for it.
The Fata Morgana Skill File installs once into whatever AI you work with. Hereâs what it does:
Catches agreement traps in your own prompts. The skill ships with a reference file of confirmation-bias trigger patterns (the language we use when weâre asking for honest feedback and steering toward the answer we already want).
Demands structural disagreement. AI pushes back in cosmetic ways. The Fata Morgana Skill File refuses cosmetic dissent and asks for pushback that would survive pressure in the real world.
Gives you a session-end honesty score. At the end of each session, it reports the agreement ratio, the moments where you rewarded agreeable output, and the one or two things you probably should have fought harder on.
Includes a tolerance dial. Mild friction, moderate friction, or aggressive friction. You set the intensity based on the project and your own tolerance for the day.
The skill file is available inside RobotsOS for all PREMIUM ROBOTS subscribers.
The desert told Herzogâs audience the truth because it had nothing to gain by lying to them.
Your AI was trained to please you.
So you go build the test the model canât charm its way through, and read what it shows you. Thatâs the whole move, and itâs well within reach! â¤ď¸
Now I want to hear something from you. Whatâs the one topic where youâve caught yourself steering your AI toward the answer you already wanted to hear?
To every Fata Morgana that turned out to be a map,
Mia Kiraki,
Chief đ¤ at ROBOTS ATE MY HOMEWORK








Great info, and I have already been putting it to use.
Miaaaa, I loved this. You have such a gift for finding the most unexpected analogies and somehow making them feel completely inevitable by the end. I never thought a 1970s desert film would end up being one of the most useful ways Iâve seen someone explain whatâs happening in AI right now, but here we are. The kicker for me was the idea that feeling challenged and actually being challenged might not be the same thing. I think a lot of us trust our own reaction inside an AI conversation far more than we should. The thought that I can walk away feeling like my idea survived scrutiny when all I've really done is become more attached to it is deeply uncomfortable. Genius on a page, as always. Now off to question every AI conversation Iâve had this week :)