What's the biggest mistake when organizing a human-and-agent team?

Shared ownership. When two or three people are 'kind of responsible' for the agent, no one actually is. The agent drifts, errors pile up, and by the time someone notices, the cleanup is expensive. Single human owner per agent is the rule.

How often should an agent be reviewed?

Weekly for production agents. The review should look at recent outputs, identify any drift, flag failures, and decide what to calibrate. Monthly reviews are too slow to catch problems before they compound. Daily is overkill for most agents.

Can multiple agents work on the same team?

Yes, and most teams end up with two to five agents. The rule is each agent has its own function, its own KPIs, and its own human owner (though one human can own multiple agents). Agents can collaborate with each other, but the accountability lines stay clean.

← The Field Manual

How Do You Organize a Team That Includes AI Agents?

Pick one function. Define what work the agent owns. Assign a human owner. Set KPIs. Run a weekly review. The whole playbook in one post.

TL;DR

Pick one function. Define exactly what work the agent owns end-to-end. Assign a single human owner accountable for outcomes. Set two or three measurable KPIs. Run a weekly review. The common failure modes are shared ownership, vague scope, no KPIs, and skipping the review cadence. Get those four right and the team works. Get any of them wrong and the agent drifts within months.

The playbook for organizing a team that includes AI agents is shorter than people think. Pick one function. Define exactly what work the agent owns. Assign a single human owner. Set two or three measurable KPIs. Run a weekly review. That's the whole structure. Companies that follow it have teams that work. Companies that skip any of the five steps end up with agents that drift, owners who don't know they're owners, and incidents that surface late.

The complexity isn't in the steps. It's in the discipline to apply them every time. Most companies do steps one and two well, get sloppy on three and four, and skip step five entirely. The result looks like productivity for a few months, then degrades silently, then explodes into a problem nobody saw coming. The teams that get it right are the ones that treat each of the five steps as non-negotiable.

Step 1: Pick one function

The biggest mistake in agent organization is trying to do too many things at once. Companies announce an AI initiative, spin up five agents in different functions, and then can't make any of them work because no one is paying enough attention to any single one.

Pick one function. Just one. The function should be specific enough that you can describe its outputs in one sentence and measure its outcomes in two or three numbers. "Inbound lead handling" is a function. "Marketing" is not. "Daily ad performance reporting" is a function. "Analytics" is not. The narrower the function, the better the agent will be at owning it.

The function should also matter enough that you'll actually invest in getting it right. Pick something you care about. Something that breaks regularly. Something where the current cost (in human hours or quality issues) is real. Picking a low-stakes function for your first agent leads to a low-stakes commitment, which leads to a low-stakes result.

At Sneeze It, our first real agent was Radar, the Chief of Staff. The function was "produce a daily morning briefing that captures what David needs to know across Slack, Calendar, Todoist, pipeline, ad performance, and team status." That's one function. It's specific. It mattered. The cost of the human alternative (David doing the synthesis manually every morning) was real. So Radar got the focus, the iteration, and the ownership it needed to actually work. Every agent we've built since then followed the same pattern.

Step 2: Define what the agent owns end-to-end

Once you've picked the function, define exactly what the agent owns inside that function. The phrase that matters is "end-to-end." If the agent owns the first 80 percent and a human owns the last 20 percent, the agent doesn't actually own the function. It assists with it.

The end-to-end definition has three parts.

What the agent takes as input. The data sources, the trigger, the conditions under which the agent runs. Be specific. "Reads Slack channels A, B, C every morning at 6am" is end-to-end input definition. "Watches Slack" is not.

What the agent does internally. The synthesis, the analysis, the drafting, the routing logic. This doesn't need to be fully spec'd in advance, but the human owner needs to understand it well enough to evaluate the output.

What the agent produces as output. The artifact, the place it lands, the format, who sees it. "Writes a markdown briefing to Obsidian's daily note under the Daily Briefing heading" is end-to-end output definition. "Sends a summary somewhere" is not.

If you can't write a clear sentence for each of the three parts, you don't have an agent yet. You have a concept that needs more design work before anyone owns it.

Step 3: Assign a single human owner

This is the step most companies fumble. Every agent needs exactly one human owner. Not a team, not a committee, not "the operations group." One person who is accountable for the agent's outcomes.

The owner does five things. They define and refine the agent's spec. They review the agent's outputs regularly. They handle escalations when the agent fails. They drive the calibration cadence. They report on the agent's performance to leadership.

The single-owner rule prevents the most common failure mode in AI-augmented organizations: agents that everybody uses and nobody owns. Those agents always degrade. They get patched in emergencies, then the patches don't get documented, then nobody knows the current state, and within nine months they cost more than they save.

A single owner can own multiple agents. That's fine. At Sneeze It, David personally owns most of the senior agents (Radar, Dash, Pepper, Crystal, Dirk, Pulse, Neil, Bassim). That works because the agents are at the top of the stack and the CEO is the right accountability line for them. In a larger company, ownership would distribute down. The CFO would own the finance monitoring agent. The CRO would own the pipeline agent. The head of CS would own the retention agent. But each individual agent still has one name attached.

The owner should not be the person who built the agent (unless they're also the right functional owner). Building an agent and owning an agent are different jobs. The builder gets it working. The owner runs it for the next two years.

Step 4: Set two or three measurable KPIs

KPIs are how you know whether the agent is actually doing its job. Without them, you'll think the agent is working long after it's stopped working.

Two or three KPIs per agent is the right number. One is too few (you'll only measure one dimension of quality). Five is too many (no one will track all of them). Two or three is enough to catch the main failure modes without overwhelming the review cadence.

The KPIs should be measurable and reviewable. "Better customer experience" is not a KPI. "Email response time under 2 hours during business hours" is a KPI. "More efficient operations" is not. "Anomaly detection rate above 80 percent and false positive rate below 10 percent" is a KPI.

For most agents, the two or three KPIs are some combination of: a volume metric (how much it produced), a quality metric (how often it was right), and a timeliness metric (how fast it ran or responded). Different functions need different specific KPIs, but the categories tend to be these three.

Once the KPIs are set, instrument them. Either the agent itself logs the data, or a separate measurement layer captures it. Without instrumentation, KPIs are aspirational and nobody believes them. With instrumentation, the weekly review becomes data-driven.

Step 5: Run a weekly review

The weekly review is the step companies skip most often and regret most consistently. The agent is in production. It seems to be working. Nobody is looking at it carefully week to week. Drift accumulates. By month nine, the agent is producing subtly wrong output, the owner hasn't noticed, and the team has stopped trusting it.

The weekly review takes 30 to 60 minutes per agent. The owner reviews recent outputs (a sample, not all of them), checks the KPIs, identifies any patterns of failure, and decides what to calibrate. Calibration might mean updating the prompt, fixing a data source, adjusting a workflow, or flagging a deeper rebuild. Most weeks the answer is "no changes needed." That's fine. The point is that someone is paying attention.

The review should be on the owner's calendar as a recurring block. Not as an open task in a list. A calendar block. The attention has to be deliberate. Without the calendar commitment, the review gets squeezed out by other priorities, and the agent drifts.

At Sneeze It, we run agent reviews every Sunday afternoon. David goes through each production agent's outputs from the previous week, checks the KPIs, and flags anything that needs adjustment. The whole pass takes about two hours for a dozen agents. That two hours is the cheapest insurance we buy.

Common failure modes

Five mistakes recur across companies that get this wrong.

Shared ownership. Two or three people are "kind of responsible." No one actually is. The agent drifts. Fix: assign one name.

Vague scope. The agent's function is "marketing" or "operations" instead of a specific output. The agent doesn't know what it's supposed to produce, and the owner doesn't know what to evaluate. Fix: write the one-sentence function definition and stick to it.

No KPIs. The agent is in production but no one is measuring whether it's working. Quality degrades silently. Fix: set two or three measurable KPIs at launch and instrument them.

No review cadence. The agent runs unattended. Six months later it's making subtle errors no one caught. Fix: weekly review on the owner's calendar.

Treating the agent as set-and-forget. The owner assumes the agent works the same way at month twelve as it did at launch. Data sources changed, edge cases appeared, prompts went stale. Fix: assume the agent needs ongoing calibration the same way a junior employee would.

What to do this quarter

Three moves to start.

First, pick one function in your business where you want an agent to own the work. Use the criteria: specific scope, real cost of doing it manually today, important enough to invest in. Don't pick five. Pick one.

Second, write the four-part spec for that function. Function definition (one sentence). End-to-end input, process, and output. Human owner (one name). Two or three KPIs (measurable).

Third, commit to the weekly review on the owner's calendar from day one. Don't launch the agent without that commitment in place. The review is what keeps the agent honest. Skip it and the rest of the work was for nothing.

The teams that follow this playbook end up with agents they trust. The teams that skip it end up with agents that quietly break the business. The difference between the two outcomes is five steps of discipline.

Now map your AI-augmented org.

Drop in your team. Add the AI agents. See the whole picture. Free forever for your first chart.

Build your chart on Orger →