How Do I Document AI Agents the Way I Document Employees?
AI agents need job descriptions, not config files. Here's the JD format that actually works: seat name, KPIs, owns/doesn't-own list, escalation path, weekly review cadence, failure modes.
TL;DR
Document every AI agent with a one-page job description that includes seat name, primary KPIs, an explicit owns list, an explicit does-not-own list, escalation path, weekly review cadence, and known failure modes. Treat the document the same way you treat an employee JD: written before the agent is built, updated when scope changes, reviewed at performance time. The format prevents the most common failure: agents that everyone uses but nobody owns.
The most expensive failure mode in AI-augmented companies is not bad agents. It is unowned agents. An unowned agent is one that everyone in the organization uses, depends on, and complains about, but nobody is accountable for. It drifts. It quietly breaks. It produces wrong answers that confidently get pasted into client reports. And when something finally goes badly enough wrong that leadership asks who owns this, the answer is some version of "I don't know, we built it together."
The fix is to document AI agents the same way you document employees. A one-page job description, written before the agent goes live, owned by a named human, updated when scope changes, and reviewed at performance time. The agents at Sneeze It all have JDs in this format. The format prevents about 80% of the accountability problems that show up in AI-augmented orgs.
What goes on the page
A working AI agent job description has seven sections. None of them are optional.
Seat name and one-line purpose
The agent has a name. Not "the marketing AI" or "the content tool," a name. Pepper, Dirk, Crystal, Radar. The naming matters because it forces the seat to exist as a thing that can be referenced, reviewed, and held accountable. "Did Crystal flag that project at risk?" is a question. "Did the project management AI flag that project at risk?" is a complaint about software.
Under the name, one line on what the agent exists to do. Not what it can do, what it exists to do. "Dirk: autonomous pipeline operator. Owns daily pipeline scanning, stale deal flagging, and proposal status tracking." If you can't write the one-liner cleanly, the agent's scope isn't decided yet.
Primary KPIs
Two or three measurable outputs the agent is on the hook for. Not vanity metrics, not activity counts, real performance numbers. The KPIs answer "how do we know this agent is doing its job?"
Dirk's KPIs are number of qualified proposals in motion, win rate on managed deals, and revenue from reactivation campaigns. Crystal's KPIs are project status accuracy, deadline miss detection rate, and resource allocation flags raised. Each agent gets reviewed against its KPIs on the same cadence as a human IC.
Owns
An explicit list of what the agent owns. This is the work the agent is accountable for producing or maintaining. If something on this list breaks, the agent is the first thing checked.
The format we use is bullet points. "Owns: pipeline data freshness in GHL. Stale deal escalation. Proposal status accuracy. Weekly revenue forecast inputs." Four to eight bullets is typical. Fewer than four and the agent probably doesn't have a real seat. More than eight and the agent is doing two jobs.
Does NOT own
This is the section most companies skip, and it is the most important one. An explicit list of what the agent does not own, written specifically to prevent scope creep and accountability confusion.
Dirk does not own client delivery (that's Crystal). Dirk does not own ad performance (that's Dash). Dirk does not own email sending (that's Pepper). Dirk does not own calendar management (that's Radar). The does-not-own list is what prevents the agent from becoming a generalized helper that everyone uses for everything and nobody can hold accountable.
This list also prevents a different failure mode: the human owner expanding the agent's scope informally without updating the JD. If a teammate asks Dirk to also draft outbound emails and the JD says Dirk doesn't own that, the answer is no. Either update the JD deliberately, or the agent stays in its lane.
Escalation path
What happens when the agent flags something it can't handle. Who gets pinged, in what channel, with what priority. This is the equivalent of "report to" on a human JD, but for failure cases.
Dash escalates ad performance anomalies to David via the morning briefing. Critical alerts post to ntfy.sh at high priority. Pulse client risk escalations go through Pepper to David, never direct. The escalation path is written down so the agent's output doesn't disappear into a Slack channel nobody reads.
Weekly review cadence
Same cadence you'd give a human IC. Weekly check on KPIs, monthly check on scope and drift, quarterly review on whether the seat still makes sense at all.
The weekly review is short. Five to ten minutes per agent, usually folded into the human owner's existing weekly review of their own work. It's where the human owner asks: what did this agent produce this week, what did it get wrong, what's the correction rate trending. Skipping the weekly review is the single best predictor of an agent drifting into uselessness within six months.
Failure modes
The known ways this agent fails. Hallucination patterns. Edge cases it handles badly. Things it tends to over-flag or under-flag. The failure modes section is written from real experience and updated every time the agent fails in a new way.
For Dirk, failure modes include: misjudging "stale" on accounts with long enterprise sales cycles, flagging procedural Proposify "won" events for existing clients as new revenue, missing meetings that happened in Fireflies but weren't logged back to GHL. Each one is documented because each one has happened and required correction.
The failure modes section is also the input to the agent's own training and prompt updates. When you patch a failure mode, you update both the prompt and the JD entry. The JD is the institutional memory of what this agent gets wrong.
Why this prevents the worst failure mode
The "everybody uses, nobody owns" failure mode kills more AI initiatives than bad models do. Here is how it happens without explicit JDs.
Someone on the team builds an agent. It works. They share it. Three other teams start depending on it. The original builder moves on to other work. The agent quietly degrades because nobody is reviewing it. Someone in finance starts pasting its output into a board deck. The agent hallucinates a number. The board sees the wrong number. The post-mortem asks "who owns this thing?" and the answer is a long silence followed by everyone pointing at someone else.
A one-page JD prevents every step of that chain. The named owner makes accountability legible. The KPIs make degradation detectable. The weekly review surfaces drift before it hits a board deck. The does-not-own list keeps the scope from sprawling. The failure modes section creates institutional memory.
None of this is exotic. It is exactly how HR has documented human seats for decades. The mistake AI-augmented companies make is treating agents as software to be configured rather than seats to be staffed. The JD format closes the gap.
What the JD is not
A few things the JD deliberately does not include.
It is not the prompt. The prompt is the agent's wiring. The JD is the agent's role. These are different documents that change at different speeds. The JD changes when scope changes (rarely). The prompt changes when failure modes are patched (often).
It is not the technical spec. APIs, model versions, integrations, cost tier, deployment infrastructure: all of that belongs in an engineering doc. The JD is for the human owner and the leadership team, not for the engineer who wrote the agent.
It is not the user guide. How to interact with the agent, how to invoke it, what slash commands exist: separate doc. Some agents have a user guide, most don't need one. The JD is structural, not operational.
Keep these separate. Letting them merge produces a five-page document that nobody reads, which is worse than no document at all.
Where the JD lives
Wherever your human JDs live. Same shared drive, same wiki section, same template. The point is that an outsider looking at how this company is run can pull up the same folder and see "here are the seats, here are the JDs," without needing to know which seats are humans and which are agents.
A few teams put agent JDs in the same notion or Confluence space as human JDs, with a tag distinguishing them. A few put them in a code repo because the prompts and JDs travel together. Both work. What doesn't work is putting agent JDs in a separate "AI playbook" doc that humans never look at. The whole point is integration.
What to do this quarter
Three steps will get the documentation discipline in place across your agents.
First, pick the three most-used agents in your company. Write one-page JDs for each in the format above. Have the human owners write them, not engineering. Time-box: one hour per JD, no more.
Second, run a thirty-minute review of each JD with the leadership team. The point is to surface the agents that don't have clean answers in the owns and does-not-own sections. Those are the ones that will fail next quarter.
Third, set the weekly review cadence. Put it on the human owners' calendars. The cadence is what keeps the JD alive. A JD that gets written once and never opened again is worse than nothing, because it creates the appearance of accountability without the substance.
Once you have three agents documented well, the format extends to the rest. The hard work is the first one.
Now map your AI-augmented org.
Drop in your team. Add the AI agents. See the whole picture. Free forever for your first chart.
Build your chart on Orger →