Can a company skip levels?

No, and trying is the most common failure pattern. Companies that build L6 or L7 agents on top of broken L2 or L3 foundations end up with autonomous systems that produce bad output faster than humans can correct it.

What's an example of L8 in production?

Sneeze It's agent message bus is L8 in practice. Agents like Jeff, Dash, Dirk, and Pulse send each other structured messages (REQUEST, INFORM, PROPOSAL, RESPONSE, CHALLENGE) without a human in the loop. The human sees the trail but isn't routing each handoff.

How do you score a team on this framework?

Score each level 1 to 10 based on how well your team executes it, then apply the hierarchy rule: your overall score is capped by your lowest level. A team that's a 9 at L7 but a 4 at L3 is functionally a 4.

← The Field Manual

What Is L8 (Bassim's 8 Levels of Agentic Engineering)?

Bassim Eledath's 8 Levels of Agentic Engineering measure how mature your AI work is, from tab-complete to autonomous agent teams. L8 is the top: agents coordinating with agents, no human in the loop. Here's the full framework and how to score your team.

TL;DR

L8 is the top of Bassim Eledath's 8 Levels of Agentic Engineering, a maturity framework for how AI is integrated into work. L1 is tab-complete suggestions. L8 is autonomous agent teams coordinating directly with each other, with no human in the loop for most operations. The framework has a hard rule: weaknesses at lower levels cap your overall score regardless of what you've built at the top.

L8 is the top of an eight-level maturity framework called the 8 Levels of Agentic Engineering, developed by Bassim Eledath. The framework measures how deeply AI has been integrated into how work actually gets done, from "AI suggests one keystroke at a time" all the way to "autonomous teams of agents coordinating with each other without human supervision." L8, the highest level, is the point where AI agents talk to other AI agents directly, hand work between themselves, and resolve most operational decisions without a human in the middle. It is the structural endpoint of where agentic work is heading.

Most companies are not at L8. Most companies are not even at L4. The framework's real value is not the destination, it's the diagnosis: it tells you exactly where the cracks are in your current AI work, why the things you've tried to build keep breaking, and what you have to fix before you can move up. The single most important rule of the framework is the hierarchy rule: weaknesses at lower levels cap your overall score regardless of what you've built at the top. A company with sophisticated L7 chatbots running on a broken L2 foundation is not at L7. It is at L2, with expensive L7 theater on top.

The eight levels in plain English

The full framework moves from the most basic form of AI assistance to the most autonomous. Each level is a real, distinct stage with its own challenges, and you have to be solid at each one before the next one works.

L1: Tab Complete. The AI suggests one keystroke or one short completion at a time. The human is doing all the thinking. The AI is shaving microseconds. This is GitHub Copilot's original mode. Most knowledge workers have used L1 even if they don't call it that.

L2: Code Editor / Inline Edits. The AI edits inside a document or editor with explicit prompts. You select some code, hit a key, and say "refactor this." The AI does the local edit. The human is still driving every decision but the unit of work has grown from a keystroke to a chunk.

L3: Coding Agent. The AI completes a defined task end to end, with the human supervising closely. "Write a function that does X" becomes a single command and the AI produces the whole function, including tests. The human reviews and accepts. This is where most AI-assisted developers operate today.

L4: Coding Workflow Agent. The AI plans and executes a multi-step workflow. Not just "write this function" but "implement this feature across three files, update the migrations, write the tests, run the suite, fix what breaks." The human supervises the plan but not every step. This is where most early agent products land.

L5: Coding Agent on Rails. The AI runs in production with guardrails. It ships code autonomously, but inside a tightly defined safety envelope (defined scope, automated testing, rollback automation, human override). This is where reliability engineering becomes critical. Most companies trying to do "AI engineering automation" are aiming at L5.

L6: Multi-Agent Orchestration. One agent orchestrates other agents. The orchestrator decomposes a request, routes pieces to specialist agents, integrates the results, and ships the answer. This is a fundamentally different design than a single agent doing more work. It requires routing logic, agent specialization, and conflict resolution.

L7: Conversational Agent on Rails. A user-facing autonomous agent running in production. Think customer-facing chatbots that actually resolve issues, sales agents that qualify leads, support agents that handle real tickets. The guardrails here are different because the user is outside the company, the conversation is unpredictable, and the cost of a hallucination is real.

L8: Autonomous Agent Teams. Agent-to-agent coordination, no human in the loop for most operations. Multiple specialized agents send each other structured messages, hand work between themselves, escalate to humans only when something exceeds their scope or confidence threshold. The human is reading the audit log, not driving the work.

The hierarchy rule (why this framework actually bites)

Most maturity frameworks let you cherry-pick. You can be a 7 at one capability and a 3 at another and still call yourself "advanced." Bassim's framework does not let you do this. The hierarchy rule says: your overall score is capped by your lowest level.

A team that is excellent at L7 (the customer-facing chatbot works) but weak at L2 (engineers can't reliably edit code with AI) is not an L7 team. They are an L2 team that got lucky on the chatbot. The reason this rule matters is that the higher-level capabilities depend on the lower-level ones. The chatbot is only as good as the team's ability to ship code changes to it. If the team is weak at L2, every chatbot iteration is slow, error-prone, and risky. The L7 layer becomes brittle.

The hierarchy rule produces uncomfortable scores. Most companies that think they're at L5 or L6 are actually at L3 with L5 ambitions. Most companies that think they're "doing AI agents" are at L4 with one L7 surface that gets all the press attention. The framework forces honesty.

Scoring your team is straightforward. Walk through each level. Rate it 1 to 10 based on how well your team actually executes it (not how much you've invested, not how many tools you've bought, how well it actually works). Your overall maturity score is the minimum of your level scores. If your weakest level is L3 at a 4, you are a 4, regardless of what you've built at L7.

What L8 actually looks like

L8 is not a vague aspiration. It has specific properties that distinguish it from L6 or L7.

In L6, one orchestrator agent makes decisions about how to route work to specialist agents. The orchestrator is the brain. The specialists are tools.

In L7, an agent talks to a human user autonomously inside a defined envelope. The conversation is between agent and human.

In L8, agents talk to other agents directly. There is no central orchestrator (or the orchestrator is one peer among many). The conversation is between agent and agent. The human is reading the trail, not in the trail.

At Sneeze It, the agent message bus is the practical implementation. Every agent has an inbox file at a known path. When the ad-pacing scanner agent (Jeff) needs the ad performance analyst agent (Dash) to weigh in on a budget anomaly, Jeff writes a structured message into Dash's inbox. Dash picks it up on its next run, responds, and the loop continues. The message types are standardized: REQUEST (asking for something), INFORM (sharing context), PROPOSAL (suggesting an action), RESPONSE (replying to a previous message), CHALLENGE (disagreeing with another agent's output).

This is L8. The human did not route the message. The human did not approve the handoff. The human reads the trail after the fact if they want to, or skips it entirely. The work flowed between agents. The accountability chart still names a human owner for each agent, but the operational coordination happened without the human.

The reason this matters is leverage. L8 unlocks coordination that humans cannot do at scale. A human cannot review every agent-to-agent message in a company running fifty agents. A human can read summaries, set policies, audit the trail, and intervene on exceptions. The work itself has to happen without them.

Why most teams stall at L4 or L5

The most common stall point is L4 to L5. Teams can get an agent to do a multi-step workflow under supervision (L4) but cannot get the agent to run autonomously in production (L5). The reasons are always the same.

Guardrails are missing or weak. The agent works when watched. The minute it runs unsupervised, it does something unexpected on edge cases. The team doesn't trust it enough to take their hands off the wheel.

Observability is missing. The team can't tell what the agent did or why. When something goes wrong, they can't diagnose it. So they default to "always supervise" because the cost of an unsupervised mistake is too high.

Evals are missing. The team has no way to know if changes to the agent made it better or worse. So they treat every change as risky and stay conservative.

These three gaps (guardrails, observability, evals) are the L5 prerequisite stack. Without them, you cannot move past L4. With them, L5 becomes accessible, and L6 becomes a design choice rather than a wish.

The L7-to-L8 transition has its own set of gaps, but they're downstream. You can't even attempt L8 if you haven't solved L5.

What to do this quarter

Three moves matter most regardless of where you are on the framework.

First, do an honest level audit. Score each of the eight levels 1 to 10 based on actual execution, not aspiration. Apply the hierarchy rule. Your overall score is your lowest level. The number will probably be lower than you expected. That is the point.

Second, fix the lowest level before doing anything at higher levels. If your L3 is broken, no amount of L7 polish will help. Most teams want to skip ahead because L7 is the shiny part. The framework says you can't. Fix L3 first.

Third, write down what L5 looks like for your specific work. L5 is the maturity inflection point. Below L5, AI is an assistant. At L5 and above, AI is an operator. Defining L5 for your business (what does "an agent in production with guardrails" mean here?) makes the path forward concrete. Most teams don't have this definition, which is why they wander.

L8 is the eventual destination for serious AI-augmented orgs. It is not a quick destination. The teams that get there are the ones who took the hierarchy rule seriously and built every level cleanly. The teams that skipped the foundation are still trying to figure out why their L7 chatbots keep saying things the company can't defend.

Now map your AI-augmented org.

Drop in your team. Add the AI agents. See the whole picture. Free forever for your first chart.

Build your chart on Orger →