After six months of running multi-agent AI systems in production, the patterns that matter most are not the ones I expected.
Reliability Over Intelligence
The agents that deliver the most value are not the most clever — they’re the most reliable. Predictable outputs beat impressive but inconsistent ones every time. This became the central design principle behind both the Multi-Agent Orchestration System and NightShiftCrew v2 — a conviction I unpack further in Why Agent Reliability Beats Agent Intelligence.
Structured Outputs are Non-Negotiable
Every agent must produce structured, schema-validated output. Free-form text between agents is a recipe for cascading failures.
class AgentOutput(BaseModel):
status: Literal["success", "failure", "partial"]
data: dict
confidence: float
reasoning: str
next_action: Optional[str]
This schema contract between agents is what makes the whole system reliable. In NightShiftCrew v2, the quality validator doesn’t use LLM judgment at all — it checks whether the code runs to exit 0. That binary test is worth more than any subjective scoring system. The same structural honesty applies to the Semantic Document Processor, where hybrid retrieval (BM25 + cosine similarity) outperformed pure neural search precisely because the simpler signal was more reliable.
The philosophical implications of this constraint — that AI operates within statistical distributions of its training data and cannot interrogate the assumptions that structure that space — extend well beyond engineering. I explore this at length in Doing Academic Philosophy in the Age of AI.
Related
Projects: Multi-Agent Orchestration System · NightShiftCrew v2 · Semantic Document Processor
Writing: Why Agent Reliability Beats Agent Intelligence · On Architecture as a Design Discipline · Doing Academic Philosophy in the Age of AI