Autonomy vs. Reliability: Why AI Agents Still Need a Human Touch

A lot of folks are betting big on AI agents transforming the way we work in 2025. I get the excitement—I’ve spent the last year elbow-deep in building these things myself. But if you’ve ever tried to get an agent past the demo stage and into real production, you know the story is a lot more complicated. My friend Utkarsh Kanwat recently shared his perspective in Why I’m Betting Against AI Agents in 2025 (Despite Building Them), and honestly, it feels like he’s writing from inside my own Slack DMs.

The first thing nobody warns you about? The reliability wall. It’s brutal. I can’t tell you how many times I’ve watched a promising multi-step agent fall apart simply because little errors stack up. Even if your system nails 95% reliability per step—a tall order!—your 20-step workflow is only going to succeed about a third of the time. That’s not a bug in your code, or a limitation of your LLM. That’s just how probability works. The systems that actually make it to production? They keep things short, simple, and put a human in the loop for anything critical.

And here’s another thing most people overlook: the economics of context. People love the idea of a super-smart, chatty agent that remembers everything. In practice, that kind of long, back-and-forth conversation chews through tokens—and your budget. Utkarsh breaks down the math: get to 100 conversational turns, and you’re suddenly spending $50–$100 per session. Nobody’s business model survives that kind of burn at scale. The tools that actually last are the ones that do a focused job, stateless, and move on.

But the biggest gap between the hype and reality is what goes into actually shipping these systems. Here’s the truth: the AI does maybe 30% of the work. The rest is classic engineering—designing error handling, building feedback that makes sense to a machine, integrating with a mess of legacy systems and APIs that never behave quite like the docs say they should. Most of my effort isn’t even “AI work”—it’s just what it takes to make any production system robust.

So if you’re wondering where AI agents really fit in right now, here’s my take: The best ones are like hyper-competent assistants. They handle the heavy lifting on the complicated stuff, but leave final calls and messy decisions to humans or to really solid, deterministic code. The folks chasing end-to-end autonomy are, in my experience, setting themselves up for a lot of headaches—mostly because reality refuses to be as neat as the demo.

If you’re thinking about building or adopting AI agents, seriously, check out Utkarsh’s article. It’s a straight-shooting look at what actually works (and what just looks shiny on stage). There’s a lot of potential here, but it only pays off when we design for the world as it is—not the world we wish we had.

Leave a Reply Cancel reply