Multi-Agent Systems: Lessons from Production

After six months of running multi-agent AI systems in production, the patterns that matter most are not the ones I expected.

Reliability Over Intelligence

The agents that deliver the most value are not the most clever — they’re the most reliable. Predictable outputs beat impressive but inconsistent ones every time. This became the central design principle behind both the Multi-Agent Orchestration System and NightShiftCrew v2 — a conviction I unpack further in Why Agent Reliability Beats Agent Intelligence.

Structured Outputs are Non-Negotiable

Every agent must produce structured, schema-validated output. Free-form text between agents is a recipe for cascading failures.

class AgentOutput(BaseModel):
    status: Literal["success", "failure", "partial"]
    data: dict
    confidence: float
    reasoning: str
    next_action: Optional[str]

This schema contract between agents is what makes the whole system reliable. In NightShiftCrew v2, the quality validator doesn’t use LLM judgment at all — it checks whether the code runs to exit 0. That binary test is worth more than any subjective scoring system. The same structural honesty applies to the Semantic Document Processor, where hybrid retrieval (BM25 + cosine similarity) outperformed pure neural search precisely because the simpler signal was more reliable.

The philosophical implications of this constraint — that AI operates within statistical distributions of its training data and cannot interrogate the assumptions that structure that space — extend well beyond engineering. I explore this at length in Doing Academic Philosophy in the Age of AI.

Projects: Multi-Agent Orchestration System · NightShiftCrew v2 · Semantic Document Processor
Writing: Why Agent Reliability Beats Agent Intelligence · On Architecture as a Design Discipline · Doing Academic Philosophy in the Age of AI

ai-agents crewai llm production

Reliability Over Intelligence

Structured Outputs are Non-Negotiable

Related

More Essays

The Automation of Judgment

Emotional AI and the Boundary of Machine Perception

The FinOps Problem in AI Agent Systems