writing • agent-stack-series
part 2:the cracks in the agent stack
The agent stack works. but its also hitting its limits on deployment.
04 August 2025

Browser-based automation is filling government forms in BFSI, LangGraph-powered planners are booking trips, MCP is starting to make memory permissioned and portable. But the same founders deploying this stack today are also hitting its limits.

If you are building in this space, these are the five frictions you will encounter first.

1. Browser Automation Is Brittle

The friction:
Browser automation wins in markets full of legacy portals and API deserts, but it breaks easily. Mobile UIs change layouts, Cloudflare upgrades bot detection, and the same government portal that your agent handled yesterday might silently add a new CAPTCHA tomorrow.

Who’s solving it:

  • Browserbase  is already running over 50M remote browser sessions per year, with stealth mouse/scroll emulation that beats most CAPTCHA challenges.
  • Indian infra startups in BFSI are adding dual-mode execution: scrape DOM where necessary, switch to APIs when available, without losing session context.

Takeaway:
Design for volatility. Treat your browser layer as an interchangeable part, not the whole machine. Build API fallbacks now, before you need them.

2. Retrieval Is a Black Box

The friction: Almost every agent stack now has RAG, but most teams treat it like a one-off integration. Chunking is naive, re-ranking is skipped, and relevance isn’t tracked- which means hallucinations creep in silently.

Who’s solving it:

  • Glean  in enterprise search has a full retrieval eval harness: every query gets logged, scored, and fed back into the ranking model.
  • Legal-tech teams in India are pre-processing case law with structure-aware chunking, hitting higher recall in multilingual queries than off-the-shelf pipelines.

Takeaway:
If you can’t measure retrieval precision and recall, you can’t improve it. Treat retrieval like search infra, not a prompt trick.

3. Memory Is Still Leaky

The friction:
Persistent, shareable memory is the dream, but without TTL enforcement, scoped permissions, and audit logs, it’s a compliance nightmare.

Who’s solving it:

  • Databricks  adopting MCP v0.9.5 for permissioned memory objects.
  • Early healthcare agents in India are building “memory firewalls”, encrypting every object, tagging by role, and logging every access.

Takeaway:
Make trust a feature. Expose what the agent remembers, for how long, and let the user expire it. Without that, your memory will stay shallow.

4. Evaluation Is 2022-Era

The friction:
Enterprises want agents they can trust, but most teams still log token counts and call it observability. There’s no live success scoring, no reason tracing, and no cost-per-task tracking.

Who’s solving it:

  • Vellum  now gives structured logs for tool calls, hallucination traces, and success/failure scoring at the workflow level.
    BFSI deployments in India are adding “policy evaluators” to block unsafe tool calls before they happen.

Takeaway:
Sell to finance or healthcare? You will not pass procurement without real-time eval, rollback, and policy enforcement. Bake it in early.

The Build Map

Frictions are not a reason to avoid the agent stack. They are the roadmap. The next generation of infra leaders will come from teams who:

  1. Build dual-mode browser/API runtimes.
  2. Treat retrieval as a measurable, tunable system.
  3. Make memory permissioned and user-visible.
  4. Layer in enterprise-grade evaluation and policy.
  5. Crush cost per task through caching and edge inference.

Get those right, and you’re not patching holes, you’re owning the rails the rest of the ecosystem will run on.

/article

The Agent Stack - Now, Next, and After

Agents have moved from hype to production.

/writing/agent-stack-series/the-agent-stack-now-next-after

part 1 : the agent stack that works today

A year ago, most AI agents were demos.

/writing/agent-stack-series/what-works-today

part 3 : rewiring the stack

The agent stack works in production today,

/writing/agent-stack-series/what-replaces-what