AI Agents Are Not Ready: What the Research Shows

Hello AI Fan!
Every AI company is now pitching "agents," software that thinks, plans, and acts on your behalf without you lifting a finger. It sounds like the future. Carnegie Mellon researchers tested it and found the best agent on the market fails at three out of four tasks. Before you hand anything over, here's what the data actually shows.

First time reading? Get your own free subscription here.

AI INSIGHT
AI Agents Are Being Sold as Your Next Assistant. They're Not Ready.

Barbara saw the ad somewhere around February. An AI agent, it promised, would handle her calendar, book her appointments, follow up on her emails, and order her groceries -- all without her lifting a finger. She downloaded the app. She spent forty-five minutes setting it up. By the end of the week, it had sent a meeting request to the wrong person, missed a follow-up entirely, and ordered the wrong brand of coffee.

Barbara is not alone. And she is not the problem.

AI agents are the biggest pitch in tech right now. Every company selling AI tools wants you to believe the future is "set it and forget it" -- software that thinks, plans, and acts on your behalf while you move on to other things. It sounds great. It just does not work yet.

Researchers at Carnegie Mellon University built a fake company and staffed it entirely with AI agents, using models from the four biggest names in the industry: OpenAI, Google, Anthropic, and Amazon. The agents were given real-world business tasks: manage schedules, analyze spreadsheets, write performance reviews, find the right colleague for a project. No human backup. Just the agents, operating on their own.

The results were not good. Not a single model completed more than 24% of its assigned tasks. Here is the breakdown:

Anthropic's Claude: 24% success rate
Google's Gemini: 11% success rate
OpenAI's GPT-4o: 8.6% success rate
Amazon's Nova: 1.7% success rate

The best-performing AI agent failed at three out of four tasks -- in a controlled environment designed specifically to let it succeed. And even basic assignments stumped them. Tasks frequently took dozens of steps, resulted in logic errors, and in one case an agent renamed a colleague to get a desired outcome.

MIT Sloan Management Review's AI experts Thomas Davenport and Randy Bean put it plainly in their January 2026 outlook: agentic AI is an expensive early-stage experiment, and it will not be ready for mainstream use for a few years. "A few years" in tech usually means longer.

The complication is that the marketing has completely outrun the reality. If you've watched any AI-related commercials lately, you've seen the demos where an agent flawlessly books a trip, negotiates a deal, or clears an inbox in seconds. Those demos are real. What you don't see is the prep work, the handholding, and the retries -- or the moments when the whole thing quietly goes sideways and nobody notices until later.

Here is what this means for you at home: the AI that works is the AI you are in the driver's seat for. The chat interface -- you type, it responds, you review, you decide -- is not some primitive version of what AI will eventually be. Right now, it is the version that actually works. Agents that operate on your behalf with minimal oversight are not there yet, and the salespeople pushing them have a financial reason to tell you otherwise.

One practical habit worth building now: any time AI gives you a result you're about to act on, verify it first. This simple fact-checking workflow takes about five minutes and catches most of the problems before they become yours.

Your skepticism is well-placed. Use the tool. Stay in control.

Want to try this yourself?

Open ChatGPT or Claude and paste this in:

❝

"I want to use AI to help me [name one task you'd hand to an agent, like managing email or scheduling appointments]. Walk me through how you'd approach each step -- and pause after each one so I can review and approve before moving forward."

This gives you the agent experience while you stay in the loop. It's slower. It's also the version that doesn't book dinner with the wrong person.

The agents will get better. They always do. But right now, the human in the loop is not a limitation. It's the feature.

WHERE TO GO NEXT
More on this topic, from sources worth your time:

Five Trends in AI and Data Science for 2026 -- MIT Sloan Management Review's full annual outlook, with a clear-eyed section on why agentic AI is still years from reliable use.
The Fake Startup That Exposed AI's Real Limits -- A readable breakdown of what the Carnegie Mellon study actually found, written for a general audience.
AI Agents Wrong ~70% of the Time -- The Register's straightforward coverage of the research, including what happens when you let agents run without supervision.

Advertising Disclosure: We evaluate all recommendations of products and services independently. Clicking on links provided on this page may result in AI for Daily Living earning compensation, which supports independent publishers like us.

AI Agents Are Being Sold as Your Next Assistant. They're Not Ready.

Keep Reading