When AI Agents Start Working Together, the Real Problem Isn’t the Agents

There’s a certain kind of chaos that only emerges when competent people stop communicating. Individual brilliance, no coordination. Everyone doing their part, nobody finishing the whole. Anyone who’s managed a cross-functional project on a deadline knows the feeling.

Agentic AI is discovering a version of this problem — and it’s more interesting than it sounds.


The Solo Agent Was Just the Warm-Up

The early framing of AI agents was appealingly simple: a model that could take a goal, break it into steps, use tools, and get things done without someone hand-holding every decision. And that framing largely held — for simple, contained tasks. Ask an agent to research a topic and draft a summary. Ask it to query a database and format a report. Single-threaded, single-purpose, manageable.

The moment that framing started showing its limits was when organisations tried to do something genuinely complex with agents — something that required multiple specialisations working in parallel, with dependencies between them and stakes attached to getting it right. An agent handling customer onboarding needs to coordinate with one checking compliance, which needs to coordinate with one updating the CRM, which needs to escalate to a human when it hits an exception it wasn’t designed for.

That’s not one agent. That’s a team. And teams need management.


Enter the Orchestration Layer

The orchestration layer is the part of agentic AI that rarely makes the announcement slide but ends up consuming most of the implementation effort. It’s the infrastructure that answers questions like: which agent runs next? What happens when one fails? How does context get passed between agents without getting corrupted? Who decides when to escalate to a human? How do you audit what the agent system actually did?

LangGraph — built on LangChain’s foundations — became a notable framework for exactly this kind of structured multi-agent workflow, with explicit state management and the ability to define conditional logic between steps. CrewAI took a different angle, letting developers define “crews” of agents with specific roles and letting them collaborate toward shared objectives. Microsoft’s AutoGen framework offered yet another model: agents as conversational participants that negotiate task execution through structured dialogue.

Three frameworks, three different mental models for what coordination means. That diversity isn’t confusion — it’s a sign that the problem space is genuinely hard and nobody has converged on the definitive answer yet. Which is, in a perverse way, the interesting part.


The Governance Problem Nobody Wants to Talk About

Here’s the tension that enterprise deployments keep running into: the more autonomous the agent system, the more carefully the boundaries need to be defined before it runs.

A single agent operating within tight constraints is relatively auditable. You can trace its steps, understand its decisions, and identify where things went wrong. A multi-agent system — where agents are delegating to each other, spawning sub-tasks, and operating in parallel — creates a very different audit challenge. The interaction effects between agents can produce outputs that no individual agent would have produced alone, which is part of what makes multi-agent systems powerful. It’s also part of what makes them difficult to govern.

This connects directly to the thread explored in the AI ethics policy vs practice post — the gap between having principles on paper and building systems that actually honour them at runtime. In an agentic context, governance needs to be baked into the architecture: which agents have authority to do what, what triggers a mandatory human review, what gets logged, and what gets blocked. Policies need to be operational constraints, not just documents.

The enterprises making the most progress on agentic deployments share something in common: they spent as much time on the governance design as on the agent capability itself. The authority question — how much can this agent do without asking permission? — turns out to be more consequential than the capability question.


Who’s Building the Runtime Layer

The competitive landscape at the orchestration level is worth watching, because it’s where platform battles are quietly forming.

Salesforce launched Agentforce in late 2024, framing it explicitly as an enterprise agentic platform with built-in guardrails and CRM integration. The bet is that orchestration is most valuable when it sits inside the systems of record that already define enterprise workflows — not as a separate layer sitting above them. Microsoft’s Copilot Studio expanded its agent capabilities around the same time, positioning Azure AI Foundry as the enterprise-grade runtime for organisations building custom agents on Microsoft infrastructure.

Google’s Vertex AI and Anthropic’s API tooling are providing the model-level primitives that many of the open-source frameworks run on top of. The emerging structure: platform players competing to own the orchestration runtime, model providers competing on reasoning quality and reliability, and a generation of specialised startups competing on specific orchestration patterns for specific industries.

The market hasn’t consolidated yet. But the shape of it is becoming visible — and it echoes the pattern from the agentic AI enterprise post from December: the sandboxes are real, the early lessons are accumulating, and the organisations paying attention now are building institutional knowledge that will be hard to replicate later.


The Coordination Tax

There’s one more observation worth naming, because it comes up consistently in early enterprise agentic deployments.

The coordination overhead of multi-agent systems is real. Each handoff between agents is a potential point of failure. Each dependency is a place where latency accumulates. Each escalation to a human is a speed bump in what was supposed to be an autonomous workflow. The promise of agentic AI is significant productivity gain — but that gain only materialises when the orchestration layer is robust enough that the system spends its time doing work, not recovering from coordination failures.

This is why the investment in orchestration infrastructure isn’t optional overhead. It’s the thing that determines whether the agent system reliably delivers on its promise or reliably disappoints. The analogy that keeps coming up: air traffic control doesn’t get credit when the planes land safely. But its absence is immediately and catastrophically obvious.


As explored in the build vs. buy post from earlier this month, enterprise AI teams are increasingly assembling rather than building from scratch. In the agentic layer, that same logic applies — except the integration and governance complexity is higher, not lower. Buying the agents is the easier part. Governing how they work together is where the real work lives.

What’s the governance question your organisation would need to answer before trusting an agent system to run autonomously — even partially?

Let’s keep learning — together.

Share your thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a website or blog at WordPress.com

Up ↑