Via Negativa in Data Architecture: Remove More, Build Less

Applying via negativa, the principle that improvement comes through removal rather than addition, I reduced a 34-component data platform to 19 components. The simplified architecture processed the same 2.8 million daily records with 41% fewer failure points, 28% lower monthly cost, and a mean-time-to-recovery that dropped from 47 minutes to 12 minutes.

What is via negativa, and why does it apply to data architecture?

Via negativa is the principle that robust systems are improved more by removing harmful elements than by adding beneficial ones, and data architectures, which accumulate components through years of additive decisions, are prime candidates for subtraction.

Via negativa, drawn from negative theology and popularized in systems thinking by Nassim Taleb, is the practice of improving a system by removing what is harmful or unnecessary rather than adding what seems helpful. In data architecture, it means defining platform quality by what has been deliberately excluded.

Every data platform I have inherited was built by addition. A new source required a new ingestion tool. A new use case required a new transformation layer. A performance problem required a caching layer. A governance requirement required a metadata catalog. Each addition solved a local problem. None considered whether the aggregate system was becoming ungovernable.

The human bias toward addition is well-documented. A 2021 study by Adams et al. in Nature found that when asked to improve a system, people defaulted to additive solutions 78% of the time, even when subtractive solutions were simpler and more effective. Data engineering inherits this bias. Conference talks celebrate new tools adopted. Nobody gives a talk titled “I removed 15 components and everything got better.” But that is frequently the more valuable engineering act.

What does a platform audit through via negativa reveal?

A via negativa audit reveals that most data platforms contain 30% to 40% of components that serve no current purpose, duplicate existing capabilities, or exist to compensate for problems created by other unnecessary components.

I conducted a via negativa audit on a data platform at a mid-size financial services company. The platform had 34 distinct components: 4 ingestion tools (Fivetran, Airbyte, custom Python scripts, a legacy SSIS package), 3 storage layers (Snowflake, S3 raw zone, a PostgreSQL staging database), 2 transformation frameworks (dbt and stored procedures), 3 orchestration systems (Airflow, a cron-based scheduler for legacy jobs, and dbt Cloud’s built-in scheduler), and 22 additional components including metadata catalogs, quality tools, caching layers, and monitoring systems.

For each component, I asked 3 questions. First: “If I removed this tomorrow, what would break?” Second: “Is there another component that could absorb this function?” Third: “Does this component exist to solve a problem created by another component?”

The third question was the most revealing. The PostgreSQL staging database existed because one legacy pipeline couldn’t read from S3 directly. The caching layer existed because the dashboard tool couldn’t handle Snowflake query latency for 3 specific reports. The second orchestration system existed because 4 legacy jobs used stored procedures that Airflow couldn’t trigger without a custom operator that nobody had written. Each was a workaround for a fixable problem, a patch that became permanent.

How do you decide what to remove?

Remove components that exist solely to compensate for other components’ limitations, that duplicate capabilities available elsewhere in the platform, or that serve fewer than 3 active use cases.

Compensatory components: If component A exists only because component B has a limitation, either fix B or replace B. Do not maintain A as a permanent workaround. I removed the PostgreSQL staging database by adding S3 read capability to the legacy pipeline (12 lines of code)
Duplicate capabilities: If two tools perform the same function, choose one and migrate. I consolidated from 4 ingestion tools to 2 (Fivetran for SaaS sources, custom Python for API sources that Fivetran didn’t support). The migration took 3 weeks. The operational simplification was immediate
Low-utilization components: Any component serving fewer than 3 active use cases should justify its operational overhead. The metadata catalog had 2 active users and consumed 4 hours of monthly maintenance. I replaced it with a dbt docs site generated from existing model documentation, reducing maintenance to zero incremental hours
Fear-preserved components: Components that nobody dares to remove because “something might depend on it.” I identified 6 such components. For each, I disabled it in staging for 2 weeks while monitoring for failures. 4 of 6 had zero downstream effects. They were dead infrastructure maintained by inertia

What is the relationship between simplicity and resilience?

Simpler architectures are more resilient because each component is a potential failure point, and reducing the component count reduces the combinatorial space of possible failures, making diagnosis faster and recovery more predictable.

The pre-audit platform had 34 components. Each could fail independently. Each interacted with at least 2 others. The theoretical failure space (considering pairwise interactions) was over 500 distinct failure modes. When something went wrong, the on-call engineer had to triage across 34 potential culprits and dozens of interaction effects. Mean-time-to-recovery was 47 minutes, most of which was diagnosis time.

The post-audit platform had 19 components. The theoretical failure space dropped to roughly 170 modes. More important, each remaining component had a clear, singular purpose that the team could explain in one sentence. When the dashboard caching layer was removed (because upgrading Snowflake’s result cache configuration solved the latency problem), one entire category of “cache staleness” incidents disappeared. You cannot have a cache coherence problem if you have no cache.

Seneca wrote that “it is not that we have a short time to live, but that we waste a great deal of it.” The same applies to engineering attention. Data teams do not lack tools or capabilities. They lack the discipline to stop maintaining systems that no longer earn their complexity cost. Every component you remove is engineering attention returned to components that matter.

Why is subtraction harder than addition in organizations?

Subtraction is harder because addition has visible champions (the person who chose the tool, the vendor who sells it) while removal has no constituency, and the organizational incentive structure rewards building more than pruning.

When I proposed removing the metadata catalog, the engineer who had deployed it 2 years earlier objected. Not because the tool was providing value, but because removing it felt like an indictment of the original decision. Subtraction in organizations is entangled with identity. We become attached to the tools we champion, the architectures we design, the complexity we manage. The complexity becomes a measure of our importance. Removing it feels like diminishing ourselves.

The Stoic response is to separate your identity from your artifacts. Epictetus taught that attachment to external things, including the systems you build, is the source of suffering. A data platform is not a monument to its builders. It is a tool for its users. And tools are improved by removing what is unnecessary as much as by adding what is missing.

I now schedule a quarterly “subtraction review” for every platform I manage. The agenda has one question: “What can we remove?” The goal is to remove at least one component per quarter. In 4 quarters, this practice reduced one platform from 26 components to 21, each removal accompanied by measurable improvements in reliability, cost, or maintainability.

The best data architecture is not the one with the most sophisticated components. It is the one where every component earns its place, where nothing exists by default or inertia, and where the team can explain the purpose of each element in a single sentence. Michelangelo, likely apocryphally, said he sculpted David by removing everything that was not David. The same principle applies. The architecture you need is already inside the architecture you have. You just need to remove what is not serving it.

architecture Nassim Taleb platform design simplification stoicism via negativa

What is via negativa, and why does it apply to data architecture?

What does a platform audit through via negativa reveal?

How do you decide what to remove?

What is the relationship between simplicity and resilience?

Why is subtraction harder than addition in organizations?

More Essays

Data Contracts Are API Contracts With Better Marketing

The Data Engineering Career Ladder Is Missing a Rung

Goodhart’s Law and the Weaponization of KPIs