OpenClaw and AI Browsers, are "Try, Oh My, Goodbye"
AI agents controlling your existing screen will probably win battles but not the war.
In 2018 I went to Shenzhen, China on business and heard a pitch from the then-head of Virtual Reality at HTC about the HTC Vive and VR devices. VR was going to infiltrate every aspect of our lives: at work, service people would wear them to inspect large machines. You could contact customer support, show the support the issue, and they’d help you fix it on the spot (assuming it’s safe). In schools, why learn about Ancient Rome when you could be transported there? For entertainment, video games and movies would become totally immersive.
In 2017, a year earlier, I had tried out a demo for a company called Dreamscape. It was Hollywood-meets-tech. The product was a VR experience unlike any that anyone had ever created. Participants wore an Oculus with a backpack, then they were transported into alternate worlds. An Indiana Jones-style world had me genuinely afraid that I was going to fall off a rock into a black void. When I was throwing a torch, which was a plastic bat retrofitted with tons of sensors, I forgot that I was wearing VR goggles but throwing a real-life object. It was so high-fidelity and spatially accurate that it didn’t matter if I was in a conference room with beige walls or in virtual northern Norway. It was electric.
After the presentation, I bought an HTC Vive for $350 and brought it back home with me where I used it once or twice an then threw it away only after letting it take up valuable closet space in my apartment, moving it out of the apartment, and letting it sit in storage for two years as I nomaded.
I first heard the phrase “Try, Oh My, Goodbye” on the All-In podcast in reference to the same VR experience I had several years prior.
Dreamscape popped up in malls and movie theaters for several years after before shutting down most of their operations. I went once. I remember it being fantastic, but, instead of going back, I always had something better to do.
The parallels between the “Try, Oh My, Goodbye” VR experience and OpenClaw are close. OpenClaw is an open-source software where you plug in a Large Language Model (LLM) like Claude or ChatGPT, and then it can control your computer and do things on your behalf by controlling your screen. Let’s ignore the security risks and mistakes it makes, it took the world by storm a few weeks ago getting millions of downloads and being featured in Wired, Business Insider, and Financial Times.
AI browsers have been around for 6 or 8 months. I first tried one in September of last year. I tried to make one my daily driver in November then gave up on it last month. The problem with the agentic browsers is the same as the problem with OpenClaw: the AI only works sometimes, and its success is largely reliant upon your knowing exactly what you want it to do and taking the time to properly communicate that.
Agentic browsers have a few (solvable) hiccups, but I think of solving the hiccups as winning battles, not as winning the war.
Productivity tools don’t seem to work well
Excel doesn’t work well with AI browsers. They can’t control the cells well. The same is true with Asana. It can’t quite fill out the fields and control Asana the way it needs to. With Excel, it usually outright rejected to work with it. With Asana, I usually needed to fix its mistakes or half-finished work. In fact, I can’t think of a time when the AI browser did the job for me half as fast or even as half well as I would’ve done it.
The second block of issues are solvable, but they severely limit adoption. First is timing and bandwidth. That’s the nice way of saying that the AI browsers are slow and lazy.
AI Browser Agents are Slow and Lazy
We have some pre-release software for which we needed documentation. I recorded a 15 minute video explaining how the software works and how to do everything that the users would need to do, and I gave that transcript to an AI to turn into a user manual. Separately, I set up the AI browser agent to click around and create the SOP. I gave it the same transcript to use as inspiration.
When the agent came back to me, I had about 100 words and 3 or 4 processes outlined. When the normal ChatGPT came back to me, I had 23 pages.
This is true, too, with finding flights. Use an AI browser to find flights and it’s a 5 or 10 minute process as it fumbles through clicking around. Use Gemini and the same task takes about 15 seconds.
I don’t usually know what I want
Combine the slow, lazy browser agents with the fact that it usually takes me two or three tries to fully know what I want and that’s a recipe for irritation. Most times I want to do something, I have to try it at least once more before I get frustrated and give up. This isn’t an issue when using the standard chat interface because it’s fast for me to speak or type, but I’m driven crazy when I need to watch the browser think for 5 seconds before it clicks “submit.”
I want to be clear. Most of my grievances can be solved through innovation and the typical product development lifecycle. Excel can work faster and Asana can work better inside of them. Clicking a button on the screen (and understanding how the screen is laid out) can be done faster. But that doesn’t change that I think it’s a battle.
The war is that the AI’s work should be mostly invisible
What makes the standard chat interface so compelling is that the user uses their natural language, the prompt goes to a server, the server does its thinking, research, or actions, and then, magically, it pops out the answer or solution. APIs have been doing tasks for us for over a decade now. A chat interface that does work by using the APIs is vastly more efficient and more “magical” than watching the AI agent struggle to identify the text field to submit.
If browsers already are the operating systems for the lion’s share of our work, then OpenClaw faces the same issues. It’s using AI to work on an operating system.
I think the solution could be compelling, but I think agentic browsers and OpenClaw are a local maxima, not a global maxima. At best, they’re a niche tool for hobbyists who want to tinker.
In Henry Ford’s day, what people thought they wanted was “a faster horse.” Ultimately, the most compelling vision of the AI future probably isn’t an AI that manipulates the interfaces you’re currently using, it’s an AI that does all the work in the background and gives you the answer or does the task in seconds. That latter technology sounds an awful lot more like a car than a faster horse.

