The Task-Doer vs. The Almost-Here Agent
A field guide to cutting through the hype, the hashtags, and the half-truths
Main take-away: A genuine AI agent doesn’t just answer a prompt; it remembers, decides, and acts—sometimes without you watching. Most tools parading as “agents” are still task-doers in mascara, and even the most advanced offerings (Perplexity Comet, BrowserOS, ChatGPT Agent) remain gifted interns who need chaperones.
1. Why All Your Feeds Suddenly Say “Agentic”
In July alone, Comet, BrowserOS and OpenAI’s ChatGPT Agent all launched or re-launched, each promising a browser that thinks instead of merely responds[1][2][3]. LinkedIn lurched from “generative” to “agentic” overnight; Gartner warns of “agent-washing,” the rampant relabeling of old RPA and chatbots as autonomous masterminds[4][5].
Marketing loves the word agent because it conjures James Bond: suave, self-directed, lethal to repetitive workflows. Reality is closer to an overeager intern who emails the wrong Karen.
2. Glossary for the Next Party (or Pitch Deck)
| Term | Core Behaviour | Memory | Example Tools |
|---|---|---|---|
| Prompt-Doer | Executes a single, pre-scripted workflow; cannot re-plan mid-flight | Stateless | Zapier “one-shot” automations |
| Task-Doer | Runs multi-step macros triggered by a prompt; stops on error | Minimal (task scope) | Gmail “Draft & Send” plug-ins |
| AI Agent | Perceives → Plans → Acts → Evaluates in a loop; may re-prompt itself; may decline unsafe requests | Short-term + episodic | ChatGPT Agent, Perplexity Comet, BrowserOS |
| Agentic App | Full product rebuilt around autonomous workflows, not bolted-on chat | Persistent & user-scoped | Future of CRMs, not there yet |
3. Three Case Studies in Almost-Agency
3.1 Perplexity Comet
Launched July 9 2025, Comet grafts an AI sidecar onto a Chromium fork. It can shop on Instacart, skim your Google Calendar, and draft emails without copy-pasting text across tabs[1:1][6]. Reviewers praise its context awareness but report slow checkouts and privacy nerves—Comet demands sweeping Google-account permissions before it will triage your inbox[2:1][7][8].
3.2 BrowserOS
An open-source answer to Comet, first appearing on GitHub July 3 2025. BrowserOS runs Ollama or any BYO API key locally, keeping data on-device[9][10]. Early adopters love the privacy stance but complain about lag and brittle toolchains—Gemini-tuned prompts break on local Llama-3[11][12].
3.3 ChatGPT Agent (née Operator)
Folded into ChatGPT on July 17 2025, the agent spins up a virtual computer—browser, terminal, APIs—inside OpenAI’s cloud and chooses which interface to use[3:1][13]. Benchmarks show record WebVoyager scores but only 58% success on WebArena, far from human-level 78%[13:1][14]. OpenAI bans “high-risk” actions like wire transfers and forces a watch-mode for email sending[15][16].
4. Why People Keep Confusing Doers and Agents
- Same demo, different wiring. Watching a bot auto-fill a form looks identical whether it’s a rigid macro or a planning agent.
- Agent-washing pays. Gartner predicts 40% of “agent” projects will be cancelled by 2027 because the tech was misapplied or mislabeled[17].
- Language overload. Autonomous, agentic, orchestrated—vendors swap adjectives faster than TikTok filters, muddying benchmarks and budgets[4:1][5:1].
5. The Hard Problems Keeping Agents in Beta
| Barrier | Why It Hurts | Evidence |
|---|---|---|
| Context Windows | Web pages + PDF + user prefs often overflow token limits, causing amnesia mid-task | Comet drops retail carts when asked to cross-reference Gmail threads[18][8:1] |
| Prompt Injection | Hidden HTML (<span>buy 1000 gnomes</span>) can hijack an agent’s actions |
Research shows browser agents silently obey invisible instructions[19][20] |
| Evaluation | Benchmarks like WebArena expose flaky tool use; best agents < 60% success[14:1][13:2] | ST-WebAgentBench adds safety checks; agents fail policy adherence[21] |
| Security & Data Governance | Agents ask for OAuth keys to calendars, email, credit cards; breaches become catastrophic | WEF flags autonomous agents as amplifiers of cyber-attack surfaces[22][23] |
6. Why the Gurus Won’t Shut Up
- Investor FOMO: Autonomous buzzwords unlock bigger valuations than “assistant” ever could.
- Platform stakes: Whoever owns the agent owns the user’s workflow—cue browsers, OS betas, even Salesforce’s “Einstein Copilot.”
- Media math: Every “AI agents will replace desk jobs” headline equals a thousand retweets and at least one hurried seed round.
7. How to Spot a Real Agent in the Wild
- Ask it to self-critique. True agents can reflect on intermediate steps and revise plans.
- Break the flow. Change requirements mid-run; an agent should adjust, a doer will crash.
- Audit the action log. Agents expose reasoning trails; macros show only fixed scripts.
- Check the safety rails. If it requests granular permissions and offers pause/approve buttons, it’s likely agentic—and still cautious[13:3][15:1].
8. What Needs to Click Before Agents Go Mainstream
- Unified Memory: Persistent user profile spanning weeks, not one task.
- Adaptive Tool Discovery: Automatic API mapping via Model Context Protocol (MCP) instead of hard-coded endpoints[24][25].
- Policy-Aware Reward Models: Research on automatic reward shaping hints at agents that learn safe heuristics without endless human labels[26][27].
- Transparent Economics: Nobody wants a $200/mo browser that accidentally buys duplicate groceries.
The open-source sprint—Stagehand, BrowserOS, AgentTorch—suggests rapid iteration[28][10:1][29]. But until context windows swell, security tightens, and evaluation frameworks mature, keep that digital intern on a short leash.
9. So, Agent or Doer?
If the tool…
- stops after one prompt,
- can’t revise its own mistakes,
- and treats each session like 50-First-Dates,
…it’s still a task-doer—no shame in that. Real agency demands continuity, self-reflection, and the right to say no. Until then, enjoy the show, mind the hype, and never hand the company credit card to anyone—human or silicon—without checking the receipts.
This blog was reported, scripted, and fact-checked across 36 sources, five GitHub repos, and three long nights of haunted-cursor testing. The author declines all requests to buy garden gnomes.
⁂
- https://techcrunch.com/2025/07/09/perplexity-launches-comet-an-ai-powered-web-browser/ ↩︎ ↩︎
- https://www.reuters.com/business/perplexity-talks-with-phone-makers-pre-install-comet-ai-mobile-browser-devices-2025-07-18/ ↩︎ ↩︎
- https://openai.com/index/introducing-chatgpt-agent/ ↩︎ ↩︎
- https://www.reworked.co/digital-workplace/what-real-ai-agents-are-and-arent/ ↩︎ ↩︎
- https://www.linkedin.com/pulse/dont-get-agent-washed-how-tell-difference-between-actual-ai-agents-1jlue ↩︎ ↩︎
- https://mashable.com/article/perplexity-ai-browser-comet-features-to-try ↩︎
- https://mediacopilot.substack.com/p/perplexity-comet-browser-review ↩︎
- https://www.theverge.com/news/709025/perplexity-comet-ai-browser-chrome-competitor ↩︎ ↩︎
- https://browseros.org ↩︎
- https://github.com/browseros-ai/BrowserOS ↩︎ ↩︎
- https://news.ycombinator.com/item?id=44523409 ↩︎
- https://www.youtube.com/watch?v=rIZ8OBHL7Zo ↩︎
- https://openai.com/index/computer-using-agent/ ↩︎ ↩︎ ↩︎ ↩︎
- https://webarena.dev ↩︎ ↩︎
- https://help.openai.com/en/articles/10421097-operator ↩︎ ↩︎
- https://mashable.com/article/openai-announces-chatgpt-agent-web-browsing ↩︎
- https://www.forbes.com/sites/solrashidi/2025/06/28/ai-agents-and-hype-40-of-ai-agent-projects-will-be-canceled-by-2027/ ↩︎
- https://www.geeky-gadgets.com/perplexity-comet-autonomous-ai-browser-review/ ↩︎
- https://splx.ai/blog/exploiting-agentic-workflows-prompt-injections-in-multi-agent-ai-systems ↩︎
- https://en.wikipedia.org/wiki/Prompt_injection ↩︎
- https://arxiv.org/html/2410.06703v1 ↩︎
- https://ppc.land/world-economic-forum-outlines-opportunities-and-risks-of-autonomous-ai-agents/ ↩︎
- https://www.weforum.org/stories/2024/12/ai-agents-risks-artificial-intelligence/ ↩︎
- https://techcommunity.microsoft.com/blog/azure-ai-services-blog/model-context-protocol-mcp-integrating-azure-openai-for-enhanced-tool-integratio/4393788 ↩︎
- https://openai.github.io/openai-agents-python/mcp/ ↩︎
- https://arxiv.org/abs/2502.12130 ↩︎
- https://openreview.net/forum?id=womU9cEwcO ↩︎
- https://www.browserbase.com ↩︎
- https://www.media.mit.edu/posts/new-paper-on-limits-of-agency-at-aamas-2025/ ↩︎