The Agent Stack SeriesPart 1: What Works Today

writing • agent-stack-series

part 1: the agent stack that works today

A year ago, most AI agents were demos.

04 August 2025

Part-1 What Works Today

style

blogbaneer

Today they are running in production, executing real workflows, and in some cases replacing entire teams. The stack is layered, stable enough to generate revenue, and already shaping enterprise buying decisions. This is what is actually working now.

1. Dual-Mode Execution Layer: Browser-based agents as infrastructure

Problem: Many enterprise workflows still live in portals without stable APIs. To automate them, you need a surface that works today (browser) and can pivot to tomorrow (API).

Examples:

Browserbase runs over 50 million remote browser sessions per year.
BFSI and logistics companies in India pay ₹2–4 crore annually for browser agents that handle ONDC merchant onboarding, government forms, and invoice capture.
CAPTCHA bypass success rates now exceed 90% (in benchmark tests) with trace-based mouse emulation and scroll modelling.

Why it works: Browser automation bypasses API gaps and integrates with the long tail of brittle UIs (still ~80% of enterprise workflows).Tooling now includes persistent cookies, DOM snapshotting, stealth re-launch, and token injection.

Analogy: Like building a detour road that bypasses the main highway until the highway is ready.

2. Planning and tool execution have stabilised

Problem: Early failures weren’t about model IQ, they were about brittle control flow.

Examples:

LangGraph and CrewAI now lead orchestration: plan generation, retries, tool chaining, and graceful error handling.

Why it works: Structured orchestration ensures reliability in multi-step workflows.

Analogy: Like air traffic control for tools: sequencing, monitoring, and rerouting to avoid collisions.

3. Memory is becoming structured and auditable

For agents to persist, they need structured, rights-managed memory, not just stuffed context windows.

MCP (Model Context Protocol) v0.9.5, adopted by Databricks and Anthropic, lets agents share and retrieve signed, permissioned memory objects.
Google’s A2A spec is an agent-to-agent API mesh with richer semantics, now backed by a Linux Foundation working group.
Time-to-Live (TTL), encryption, and context scoping are in early enterprise pilots.

Why it works: Context sharing between tool calls is mandatory for multi-agent systems. Enterprises need audit logs to comply with DPDP, HIPAA, and soon the EU AI Act. TTL and encryption close the gap between prototypes and production.

Analogy: Like a bank ledger, every entry is timestamped, permissioned, and reviewable.

4. Retrieval has evolved into a stack

Retrieval-Augmented Generation is no longer a differentiator, it is the default. Quality comes from how it is implemented.

Hybrid retrieval stacks reduce hallucination and citation drift by 35–50% over vector-only systems.
Vector databases like Pinecone, Weaviate, and Qdrant are standard.
Retrieval quality depends on chunking, scoring, and feedback. Treating RAG like search infrastructure- with eval loops and tuning- delivers higher accuracy.

Why it works: Treat retrieval like search infrastructure: tune chunking, scoring, and feedback continuously.

Analogy: Like upgrading from a flashlight to a GPS , you don’t just see more, you know exactly where to look.

5. Evaluation is becoming an enterprise priority

Enterprises will not trust what they cannot measure.

Vellum serves over 1,500 enterprise LLM development teams with observability dashboards.
Mature stacks log latency, token cost, tool-call status, and user feedback. Some frontier teams run synthetic red-team simulators to probe for unsafe behaviour and generate auto-blocking policies.

Why it works: Evaluation is now a deal-breaker for finance, healthcare, and public sector deployments. Boards are asking for “agent observability” and requiring red-team testing before sign-off.

Bottom line for founders
The agent stack is real, live, and producing revenue. The bar is rising quickly. The best teams are:

Picking a vertical wedge with high pain and high willingness to pay.
Controlling infra cost per task, down to browser minutes and token use.
Shipping proper evaluation harnesses.
Building compliance features like TTL and encryption into memory from day zero.

If you are building an agent company in 2025, you are not early anymore. You are on time.

/article

style

related-articles

Related articles