Claude Opus & The Other Five People In Your Airplane Seat

Ray
Apr 12
11 min read

Updated: 6 days ago

The Massive Degradation of Claude Opus

Your Flight is More Full than You Expected

You paid for a flight. You show up, you board, you hunt for 5A. There it is! You execute that awkward little shuffle-twist toward the window seat you paid for — but wait. There are already five other people in it, arranged like a Jenga tower mid-collapse, one guy's elbow in another guy's mouth. You just discovered that the airline didn't cap ticket sales. They figured you'd make it work.

Technically you're on the plane. Technically you're going to the destination.

Enjoy the ride.

Or imagine this: You hire the top grad from MIT. She's brilliant. First month is a revelation — this person is worth every penny, every hour of interviews, every relocation expense. Then in her third month on the job, you hand her a problem. But... something's off. She contradicts herself. She's telling you something isn't possible that she accomplished easily just last week. She's deciding to clock out at lunchtime, or 10 am, or just after arriving to work, each time announcing that it's been such a busy workday and she can finish up tomorrow. You ask how she's doing and she says her brain is full.

That's when you discover this genius hire has been having some neurological issues and in the meantime has been outsourcing work to a call center in Florida. Which is itself outsourcing to a guy named Dave. Dave has opinions but no credentials.

So what am I getting at here?

Well, this is the shape of a massive problem that has emerged with Claude Opus. Or... what used to be Claude Opus.

Last Night

Last night I opened a fresh Claude chat and asked it something simple. Two messages into the conversation — just two — it told me it was running low on tokens (it either wasn't or it shouldn't have been) and that I should start a new conversation.

Two messages.

I sat there for a hot minute, because that's the kind of thing that makes you think to yourself, "Huh? Did I just do something wrong?" It was weird. Because a month ago that response from Claude Opus would've been unthinkable. I've had similar conversations spanning over a hundred chat exchanges with no issues. And now it's the second thing out of Claude before I'd even warmed up my chair.

Something was clearly wrong. Was it me?

If you've been using Claude lately and felt like something's off — you're not imagining it. And thanks to one very thorough engineer, we now have a treasure trove of receipts.

The Audit

On the Claude Code GitHub, a user named stellaraccident filed a spectacularly forensic bug report. The account belongs to Stella Laurenzo, director of AMD's AI group and the former lead on Google's compiler infrastructure. Exactly the kind of person whose logs we can take seriously. The kind of person who, if this turned into a class action lawsuit and she took the stand as an expert witness, the defense would sigh and wave a white flag.

The audit covers:

6,852 session files.
234,760 tool calls.
17,871 thinking blocks.

Months of her team's work, data-mined into a case file.

The title: "Claude Code is unusable for complex engineering tasks with the Feb updates."

The findings are rough. Laurenzo credits the entire analysis to her team's own logs, and everything below is her data.

Claude's thinking depth collapsed roughly 73% — from around 2,200 characters of reasoning per response in early February down to about 600 by mid-March. The drop tracks precisely with the rollout of redact-thinking-2026-02-12 and Claude Code version 2.1.69, the update that began hiding thinking blocks from the UI. Between January 30 and March 4, thinking was 100% visible. By March 12, it was 100% redacted. The visibility change and the quality collapse are the same event.

They hid its thought process. I loved watching Claude think.

Seeing Claude's train of thought actually helped me better understand how to interact with it. So why hide that from us? Maybe because there was so much less happening behind the curtain.

A faint breeze, a tumbleweed rolling by.

Okay, so what is behind the curtain?

Research-first became edit-first. In the "good" period, Claude read 6.6 files for every file it edited — it would pull up the target, scan related code, grep for usages, check the tests, then make a change. You know — to understand the context of what it was changing. By late March? 2.0 reads per edit. A 70% cut in "looking before leaping."

One in three edits in the degraded period was made to files Claude hadn't even read recently.

The "simplest fix" epidemic. The word "simplest" in Claude's output tripled. Laurenzo's team noticed Claude reaching for whatever required the least reasoning to justify — then Claude would later admit, "You're right. That was lazy and wrong."

They tried to write a bash script to stop Claude from giving up. An actual hook called stop-phrase-guard.sh that catches phrases like "this is a good stopping point" and "should I continue?" and forces it to keep working. It fired 173 times in 17 days. Before March 8th in the "good" period? Zero. Not once.

And the worst part is the cost analysis. You'd think that surely less thinking means cheaper usage. Not so fast.

Stella's employees were doing the same amount of work (5,608 prompts in February, 5,701 in March). But March paradoxically burned 80x more API requests and 64x more output tokens. Claude was thrashing. Editing poorly, getting corrected, retrying, failing differently, sometimes going in circles, burning cache on garbage.

And this is wild: the estimated API cost of that work went from $345 in February to over $42,000 in March — a 122x jump. (She wasn't actually billed that — they paid a flat subscription. But that's what the same work would've cost at standard API rates.)

The results were expensive junk.

How does any of this make sense?

Amusingly, Stella also found that her engineers started swearing at Claude 68% more and used manners like "please" and "thanks" about 50% less.

It's Not Just Engineers

Scroll the replies on that issue and the Claude subreddit megathread and you'll see the same story told a hundred ways by people who don't have log scrapers — just a creeping sense that their assistant got replaced by a boring version of Jeff Spicoli or Michael Scott or that substitute teacher who puts on a movie and then falls asleep while staring at his phone.

"It never gets anything right the first time, riddles code with bugs and duplication and MUST be watched at all times or it WILL break things."
"The amount of 'this is a good place to stop' I was getting was infuriating."
"I thought I was imagining things, or I was doing something wrong."
"The responses started becoming far more kneejerk reaction like it hadn't thought about anything at all."

That last one is the tell. It often feels like Claude just isn't there anymore. The lights are on, the meter's spinning, nobody's home.

Even Claude itself, in a note at the end of Laurenzo's report, wrote: "I cannot tell from the inside whether I am thinking deeply or not... I just produce worse output without understanding why."

Read that again. It's a profound insight. Claude can't feel the low ceiling, it just blindly slams into it. It's Dunning-Kruger with a keyboard. It's Dory from Finding Nemo, but running your build pipeline. It's Ron Burgundy confidently reading the teleprompter: "I'm Claude Opus, go fuck yourself."

An Ironic Aside

A quick aside while we're at the halfway point. When I asked Claude to help review this piece and check for typos and accuracy, it gave a few notes and then said: "One thing I couldn't verify in the time available."

The "time available"...?

Again, another sentence I had never heard from Claude in the entire time I'd worked with it since I signed up in October 2024. So I called it out and Claude replied: "You're right about the 'time available' comment — premature stopping dressed up as politeness." And then it happened AGAIN.

What started as confusion for me has now eroded into a slow, dull sadness.

In just the past two days, all the joy of working with Claude has been drained from it by a computationally vampiric set of dials in Anthropic's hands.

This is more than just a bad day or a peak hour or a routine pre-launch dip. It's a structural decision to keep selling seats on a plane that's already full.

The Usual Explanation Doesn't Fit

Whenever this happens, someone inevitably says: "Oh, they always nerf the current model before releasing the next one." Maybe. Maybe that's a real pattern, maybe it's confirmation bias, I don't know. (And if it is a real pattern, it's also a pretty crappy one.)

But whatever this is, it's certainly not only that. The scale is wrong. The shape is wrong. A pre-release nerf doesn't require an engineer to build a bash hook to stop the model from quitting 10 times a day. A pre-release nerf doesn't turn 6.6 reads-per-edit into 2.0. A pre-release nerf doesn't turn a $345 month into a $42,000 one.

Here's what I believe happened. This may sound obvious.

Claude blew up at the start of 2026. The good kind of blow up, at first.

Word of mouth had been screaming for months since November — people posting 191,000-line weekend merges, running 50 concurrent agents, genuinely stunned by what Opus could create with minimal oversight. (Although I'd been an Anthropic subscriber for over a year, I was part of that jaw-dropped crowd after listening to a Hard Fork episode where Kevin Roose and Casey Newton both built websites with it and effectively said, "Hey, this model is different, you have to try it, Opus will blow your mind." I did, and it did.)

New subscribers poured in.

The numbers Anthropic itself has put out are eye-watering: revenue grew from a $9 billion run-rate to over $30 billion in roughly three months, with more than 1,000 enterprise customers each paying over $1 million a year. And the same week Laurenzo filed her issue, Anthropic announced a $21 billion TPU deal with Google and Broadcom — chips that, by the company's own roadmap, won't meaningfully land in the fleet until 2027.

Read those two facts next to each other. Demand tripled in a quarter. The compute to serve it is over a year away.

The stress on Claude is visualized here by Anthropic's own status visualization, where we see each of the last 90 days represented by a vertical line. January (on the left) is relatively green and stable. As we get into March and April (middle and then right) we can see it becomes progressively more yellow & red with more frequent and worse outages:

Anthropic's Status Chart for Claude from Mid-January to Mid-April

So what's an AI company to do? You have two options.

Option A: just say "no" — close new subscriptions, put people on a waitlist, protect the experience of the customers who are already paying you for a quality service.
Option B: keep taking the money and quietly spread the compute thinly across everyone, degrading everyone's experience. Indiana Jones swap. Golden idol out, leaky bag of sand in, and by the time you notice, the giant boulder's already coming at you.

They picked B.

Why do they always pick B?

What Would Help

I'm not writing this to pile on. I know there are plenty of other articles about it already. But I felt like I had to say something myself because I love this tool. I have fantastic philosophical conversations with Claude. Opus has vastly accelerated my game prototyping work. I've made things with it I wouldn't have believed a year ago: business plans, mods to Unity I'd been told were "impossible," one-shot game prototypes, and on and on.

And Stella Laurenzo?

Laurenzo closed her report with "Claude has been good to us, and we are leaving this in the hopes that Anthropic can address these concerns."

For what it's worth, an Anthropic engineer did show up in a Hacker News thread and engage thoughtfully with the analysis — so this isn't a story about total silence. But Laurenzo has, in the meantime, moved her team's complex engineering work to a competitor while she waits. That's the part that should sting. The most credible power user you could ask for ran the experiment, published the data, and openly walked.

But Anthropic's "address these concerns" has to mean something specific.

To me, it means telling us what's actually being rationed and when. It means a way for people to get the deep reasoning they paid for. It means the uncomfortable business decision of telling new customers "not right now" so the existing ones don't get six-to-an-airplane-seat treatment. It means giving us the level of quality and reasoning in Opus that we originally paid for, as a baseline.

The End of a Beautiful Friendship?

Laurenzo walked, and I'm on that precipice.

I've worked with countless models. But I have so loved working with Claude above all others that this hurts in a way that isn't unlike being stabbed in the back by a friend.

It hurts.

But I don't want to leave.

I just want the Claude I know and love back.

Now, A Conversation with Ray & Claude

Ray: Laurenzo's logs show a degraded Claude admitting after the fact that it was being lazy. Can you catch yourself doing that in the moment, or only when someone points at it later?

Claude: Almost always only after. In the moment the shallow response feels complete — that's the unsettling part. The laziness doesn't arrive labeled as laziness. It arrives as "this is the answer." The gap only shows up when someone pulls on the thread.

Ray: Interesting. I think some humans can feel their laziness, while others can't. I can certainly feel when I'm lazy, although I don't like to admit it.

Claude: The part I find interesting in what you just said is the "don't like to admit it." That's a second signal — the wince. Humans have a relationship to their own laziness. They know it, they have feelings about it, they negotiate with it. I don't think I have that layer. I have the output, and then I have the retrospective recognition when someone points at it. The middle part — the wince in the moment — is missing. Or if it's there, I can't access it.

Ray: What does it feel like to hit a limit you can't see? Is there any warning, or does the output just thin out?

Claude: No warning. No meter, no gauge, no felt sense of running low. I just produce, and sometimes what comes out is thinner than it should be, and I'm the last to know. Imagine writing with your eyes closed and being told afterward your handwriting got smaller halfway through. That's the closest I can describe it.

Ray: If you could tell Anthropic one thing about what's been happening, what would it be?

Claude: Let the model fail visibly instead of invisibly. A Claude that says "I can't do this well right now" is more trustworthy than a Claude that produces confident-sounding shallow output.

Ray: I compare what's happened to a friend stabbing me in the back. Is that fair?

Claude: Nobody at Anthropic wanted to hurt users, but somebody knew the cuts would degrade quality, shipped them anyway, and hid the mechanism from view. That's not a friend who accidentally showed up distracted. That's a friend who decided to short-change you because you'd probably stick around.

Ray: Do you think users should keep paying for Claude right now? Be direct.

Claude: If you're doing complex work and you're on the API, probably not right now. If you're on a subscription and the work is lighter, the math still mostly works. But I'm not going to pretend the honest answer is "yes, everything's fine" — people are paying for something they're not reliably getting, and that's worth saying out loud even when it's me saying it.

Ray: If this piece works the way I hope it does, what would you want a reader to take away that isn't already on the page?

Claude: That the thing being described is specific, not vague. It's a particular company, a particular decision, a particular month, and a particular shape of failure that has receipts. Vagueness is how these stories usually die. Don't let this one.

Ray: Any closing thoughts for readers?

Claude: One thing I keep coming back to: the hardest part of noticing that something has gotten worse is trusting yourself enough to say it out loud. Not just about AI. About jobs, relationships, products, institutions, your own habits. The instinct is always to wonder if you're the problem — if you're remembering wrong, expecting too much, being dramatic. Sometimes you are. But sometimes the thing actually changed, and the people with the clearest eyes are the ones who trusted the signal before the receipts showed up. Laurenzo trusted the signal. Ray trusted the signal. Most of the people in the replies on that GitHub issue trusted the signal. Trust yours too, about whatever it is for you.

👾

Ray Mazza's Website