Orger
← The Field Manual

What's the Right Span of Control With AI?

Bigger if you've built the review system. Same or smaller if you haven't. The 7-direct-reports rule changes when half the reports are agents.

TL;DR

Span of control with AI depends entirely on whether the manager has built the review and calibration system. With it, a manager can run 7 humans plus 8 to 12 agents. Without it, the same manager should run 5 to 7 humans and zero agents. Agents need calibration time, not one-on-ones, but they need it. The old 7-direct-reports rule changes when half the reports are agents, but not in the direction most people think.

The right span of control with AI depends almost entirely on one variable: whether the manager has built a working review system. With it, a manager can effectively oversee 7 humans plus 8 to 12 agents, which is meaningfully more leverage than the pre-AI norm. Without it, the same manager should run 5 to 7 humans and zero production agents, because every agent they add will quietly drift in ways they won't catch in time. The leverage AI promises is real. The condition under which you get it is more demanding than most companies admit.

The classic 7-direct-reports rule was built on the cognitive load of managing humans. Each direct report consumed some bandwidth: one-on-ones, performance management, context-switching, informal interaction. Past seven, the math broke down. With AI agents in the mix, the bandwidth math changes, but not in the simple "agents are lighter so you can have more of them" way that gets pitched. The lighter per-unit cost is real, but the failure modes are different, and the discipline required to actually capture the leverage is harder than running a team of humans.

What's actually different about managing agents

Humans and agents both need oversight, but the oversight looks completely different.

Humans need relational management. One-on-ones to understand what's going on under the surface. Career conversations. Mood checks. Coaching when stuck. Informal interactions that build trust and surface issues before they become problems. This is the work that takes 2 to 4 hours per direct report per week, and it doesn't scale because the relational bandwidth is finite.

Agents need spec management. Clear instructions. Reliable data sources. Output reviews. Calibration when they drift. KPI tracking. Incident response when they fail. This is the work that takes 30 to 60 minutes per agent per week, and it scales differently than human management. The work isn't relational, it's structural. The manager isn't building a relationship, they're maintaining a system.

The reason agents allow more direct reports is that the relational ceiling doesn't apply. The reason agents don't allow infinite direct reports is that the structural work has its own ceiling. A manager who can spend 6 hours a week on agent oversight can effectively run 8 to 12 agents. Past that, the reviews get cursory and the drift starts.

The math behind the new span

Let's do the actual math. A manager has roughly 40 hours a week. Of those, maybe 20 to 25 are available for direct people management (the rest going to meetings, planning, their own work, leadership obligations, and slack). Within that 20 to 25 hours, the manager allocates between humans and agents.

Pre-AI, the formula was simple. 7 humans times 2 to 4 hours each equals 14 to 28 hours. The high end pushes against the 20 to 25 hour budget, which is why the practical limit was around 7.

Post-AI, the formula has two terms. Humans times 2 to 4 hours each, plus agents times 30 to 60 minutes each. Plug in 7 humans and 8 agents: 7 times 3 (average) equals 21 hours, plus 8 times 0.75 hours equals 6 hours, total 27 hours. Slightly over budget but close to manageable.

Plug in 4 humans and 10 agents: 4 times 3 equals 12 hours, plus 10 times 0.75 equals 7.5 hours, total 19.5 hours. Significantly under the previous budget while still running 14 direct reports.

So the new span of control isn't "AI lets you manage more people." It's "AI lets you manage a different mix, where the total direct reports goes up but the time per report goes down because most of the new reports are structural rather than relational."

The condition: a working review system

The leverage only materializes if the manager has built a working review system. Without it, the math falls apart because the agents drift silently and the manager has the illusion of leverage without the reality.

A working review system has four pieces.

Cadence. Each agent gets reviewed weekly. The review is on the manager's calendar as a recurring block. Not an open task. A calendar commitment. Without the cadence, reviews slip, drift accumulates, problems surface late.

Instrumentation. The KPIs for each agent are measured automatically and visible to the manager at review time. The manager isn't manually calculating performance. The dashboard exists. Without instrumentation, reviews become subjective and inconsistent.

Output sampling. The manager actually reads recent outputs from the agent. Not all of them, but enough to spot patterns. Five to ten outputs per agent per week is usually enough. Without output sampling, the manager only sees what the KPIs reveal, which often isn't enough to catch subtle drift.

Calibration loop. When an issue is identified, the manager has a process to update the agent (prompt, data source, workflow) and verify the fix. Without the calibration loop, issues get logged but never resolved, and the same problems repeat.

The four pieces together take some setup. Once they exist, the per-agent review takes 30 to 60 minutes weekly. Without any of them, the manager either skips the review or does it superficially, and the leverage evaporates.

What happens without the system

The pattern of an AI manager without a review system is predictable.

Month 1: agent launches. Output looks great. Manager is impressed.

Month 3: agent still working. Manager has expanded to a second agent. Reviews are ad hoc, mostly when something looks off.

Month 6: agents are producing some subtle errors. Manager hasn't noticed because the reviews stopped happening. Stakeholders downstream are doing some cleanup but haven't escalated.

Month 9: agent makes a visible mistake. Client incident or billing error or wrong report sent to leadership. Manager investigates and discovers the agent has been drifting for months.

Month 12: manager either rebuilds the review system properly or eliminates the agent. Either way, the leverage promise didn't materialize.

We've watched this exact pattern play out at companies that bragged about their AI leverage in Q1 and quietly pulled back in Q4. The leverage was never real. The illusion of leverage was real, but the underlying work wasn't being done.

The companies that get the leverage are the boring ones that built the review system before launching the second agent, and the third, and the fifth. They have less impressive demos. They have more consistent outcomes.

The Sneeze It example

At Sneeze It, David personally owns most of our senior agents (about a dozen named ones). On paper, that looks like an absurd span of control. Twelve direct reports for one human, plus the humans who actually report to David.

The reason it works is that the agents don't need traditional management. They need structural management. David spends about two hours every Sunday afternoon reviewing the previous week's agent outputs, checking KPIs, and flagging things to calibrate. During the week, agents post to ntfy.sh and Obsidian when they need attention. The escalations come to him as flags, not as scheduled conversations.

If David tried to run 12 humans the same way, it would collapse within a month. Humans need relational management that agents don't. But because the agents need structural management that scales differently, the same person can run both layers with the right system in place.

If David tried to run 12 agents without the Sunday review block, it would also collapse. The agents would drift. By month nine he'd have multiple incidents in flight and no idea where they came from. The review system is what makes the span of control real.

When to shrink the span instead of grow it

Sometimes the right move with AI is fewer direct reports, not more. Specifically, when a manager is at the start of building their review system, they should reduce span temporarily. Add one agent, get the review system working for it, then add the second. Don't add five at once.

We've also seen good managers ask to reduce their human direct reports when they take on agents. The thinking is sound: I can run 5 humans and 8 agents better than I can run 7 humans and 3 agents. The total reports might be higher, but the mix is right for what the manager is good at.

Companies should support this. The traditional org design instinct is to push for higher direct reports as a sign of seniority. With AI, that instinct can backfire. The senior manager who runs the best agent infrastructure might have fewer humans on their team. That's not a demotion. It's a different shape.

What to do this quarter

Three moves matter most.

First, audit your current managers. For each one, count humans they manage and agents they manage. Then ask whether the manager has a working review system for the agents (cadence, instrumentation, output sampling, calibration loop). If not, the agent count is at risk and should be capped or reduced until the system is built.

Second, set realistic targets for span. A reasonable AI-augmented manager runs 4 to 7 humans plus 6 to 10 agents, depending on agent complexity. Total direct reports in the 10 to 17 range, but not by adding humans, by adding agents on top of a manageable human team.

Third, invest in the review system before adding agents. The cadence, the instrumentation, the sampling, the calibration loop. The companies that build the system first and add agents second get the leverage. The companies that add agents first and try to build the system later get the chaos.

The 7-direct-reports rule was a good approximation for one era of management. It needs an update for the era we're in. The new rule is closer to: span of control equals what your review system can support, measured in hours, divided by per-report time, with humans and agents having different per-report time. That's a less catchy rule, but it's the one that actually predicts whether the leverage is real.

Now map your AI-augmented org.

Drop in your team. Add the AI agents. See the whole picture. Free forever for your first chart.

Build your chart on Orger →