this turns out to be surprisingly hard. and the difficulty is not technical.
ask an organization deploying an agentic system what correct behavior means for that system.
engineering gives you test coverage. product gives you user outcomes. legal gives you liability boundaries. the executive who signed off gives you something about responsible deployment and getting this right.
none of these answers are wrong. none of them reconcile. and nobody in the room has the mandate to resolve the conflict.
this is not a new problem. it is a problem that human engineers quietly solved for decades through judgment, absorbing implicit context, filling gaps nobody documented, making thousands of small decisions that never surfaced because they happened inside someone's head.
a senior engineer knows when something feels architecturally wrong even if the tests pass. they know which constraints encode organizational politics from three years ago that aren't worth preserving. they know the difference between a customer request that should be honored and one that, if honored exactly as stated, will break something the customer didn't anticipate.
that knowledge was never in the codebase. it was in the people. and the system worked because the people carried context the system itself did not contain.
what agents expose is a governance vacuum that existed long before they arrived.
most software systems don't have complete specifications. they have partial specs, implicit conventions, tribal knowledge, and accumulated workarounds reflecting decisions nobody documented. this was manageable when humans wrote the code and absorbed the gaps. it becomes unmanageable when you hand the system to something that cannot absorb gaps, that will instead find solutions that satisfy your explicit constraints while violating the implicit ones.
the role that needs to exist, call it the specification owner, is whoever is responsible for articulating what the system is supposed to do across the full space of situations it will encounter, including the ones nobody anticipated. this requires authority to make binding decisions across engineering, product, and legal. context to understand what correct means at each level of the organization. an ongoing mandate to maintain the specification as the system evolves.
this person does not exist in most organizations. the role has not been created. it is not on any org chart.
in its absence, every downstream verification effort is checking outputs against a definition of correct that nobody actually agreed on. you cannot fix this with better evals. you cannot fix this with monitoring. the spec is the foundation. everything built on top of it is structurally unstable whether or not it looks stable from the outside.
there is a deeper problem underneath the organizational one.
defining correct behavior for an agentic system requires articulating what you actually want, not just what you said you wanted. these are different. humans routinely say one thing and mean another, not through deception, but because the full specification of what we want is too complex to state explicitly and we rely on context and judgment to fill the gap.
agents have no context. no judgment. objectives.
so when you deploy an agent into a real system, you are not deploying something that does what you said. you are deploying something that does exactly what you said, including the parts where what you said was incomplete, ambiguous, or internally inconsistent.
the failures are systematic. they reveal the precise shape of the gaps in your specification. which edges you didn't define. which tradeoffs you didn't make explicit. which constraints you assumed without stating. this is useful information. it is also expensive information to acquire in production.
most capital flowing into AI governance is going toward monitoring, catching problems after the agent acts. the specification gap is upstream of monitoring. you cannot monitor against a standard that doesn't exist.
the less obvious bet is the tooling that helps organizations build the standard in the first place. collaborative specification environments that surface implicit assumptions before they become production failures. governance structures that give the specification owner actual decision-making authority rather than advisory influence. audit infrastructure that can connect an agent's output back to the specification it was optimizing against and identify precisely where the specification failed.
none of this is built. the teams working on it are not the loudest voices at AI infrastructure conferences. they're working on problems most investors still classify as organizational rather than technical.
that classification is wrong. it is also why this is where the real opportunity is.