dbt Changed Data Engineering. Here Is What It Got Wrong.

dbt adoption grew from approximately 5,000 organizations in 2021 to over 40,000 by 2025, transforming data engineering into a SQL-first discipline. But after implementing dbt across 6 projects, I found that its strengths (SQL accessibility, version control, testing) come with structural weaknesses: Python escape hatches that fragment codebases, testing that covers structure but misses semantics, and a complexity ceiling that pushes teams toward workarounds.

What did dbt get right about data engineering?

dbt got three things fundamentally right: it made data transformations version-controllable, it brought software engineering practices (testing, documentation, code review) to SQL, and it lowered the barrier for analysts to contribute to the transformation layer.

dbt (data build tool) is an open-source transformation framework that enables data teams to write SQL SELECT statements that dbt compiles into DDL/DML and executes against a data warehouse. It introduced software engineering conventions (version control, testing, documentation, modularity) to the data transformation layer.

Before dbt, I managed transformation logic spread across stored procedures, Python scripts, SSIS packages, and Excel macros. Finding the logic that produced a specific table required archaeology. Testing was manual. Documentation was nonexistent. dbt consolidated all of that into a single, version-controlled, testable codebase. That consolidation alone justified its adoption.

The analyst empowerment was equally significant. Before dbt, analysts submitted transformation requests to data engineers, creating a bottleneck. With dbt, analysts who could write SQL could create their own transformation models, review them in pull requests, and deploy them through the same pipeline. I saw one team reduce their transformation request backlog from 6 weeks to 1 week after dbt adoption. That is a real productivity gain.

Where does dbt struggle as a transformation paradigm?

dbt struggles with complex business logic that does not fit SQL well, with Python models that break the SQL-first contract, with testing that validates structure but not business semantics, and with a DAG complexity ceiling that makes large projects difficult to navigate and maintain.

The Python escape hatch: dbt added Python model support to handle logic that SQL cannot express cleanly (ML feature engineering, complex statistical calculations, API calls). But Python models run differently, test differently, and deploy differently than SQL models. In one project, 15% of our models were Python, and they accounted for 60% of our pipeline failures. The two paradigms do not integrate cleanly, and the resulting codebase has two mental models that engineers must context-switch between
Testing limitations: dbt tests validate schema (not null, unique, accepted values, relationships). They do not validate business semantics. A test can confirm that a revenue column has no nulls. It cannot confirm that the revenue calculation is correct. I have seen dbt projects with 200 passing tests that still produced wrong numbers because the tests covered structure, not logic. Semantic testing requires custom tests that most teams do not write
DAG complexity: A mature dbt project can have 500 to 1,000 models. The DAG becomes a spaghetti diagram that no one can comprehend holistically. I have worked on projects where a change to one staging model triggered rebuilds of 47 downstream models, and understanding the impact required tracing through 8 levels of references. dbt provides no built-in tools for managing this complexity. According to dbt’s own documentation, project organization is left to the team, which means most teams discover their organizational problems only after the DAG is too complex to refactor easily

What did dbt get wrong about the data engineering workflow?

dbt assumed that transformations are the center of data engineering, but in practice, the hardest problems (ingestion reliability, orchestration, data quality, cost management) live outside dbt’s scope, creating an ecosystem gap where the tool that everyone uses covers only 30% of the actual work.

I measured time allocation across 4 data engineering teams using dbt. On average, engineers spent 28% of their time writing and maintaining dbt models. The remaining 72% was spent on ingestion debugging, orchestration management, infrastructure maintenance, stakeholder communication, and data quality investigation. dbt optimized the 28% while leaving the 72% to a fragmented collection of other tools. The orchestration layer is where most complexity actually lives, and dbt treats orchestration as someone else’s problem.

What does the next evolution of data transformation look like?

The next evolution integrates transformation with orchestration, testing with semantics, and SQL with Python as first-class equals rather than primary and escape hatch, producing a unified development experience for the full data engineering workflow.

Tools like Dagster and SQLMesh are addressing some of these gaps. Dagster integrates orchestration and transformation in a single framework. SQLMesh adds incremental computation and virtual environments. Neither has dbt’s ecosystem maturity or community size, but both address structural weaknesses that dbt has been slow to resolve. The data contracts pattern addresses the inter-team boundaries that dbt’s model-to-model references cannot express.

dbt changed data engineering for the better. It deserves credit for making transformation testable, versionable, and accessible. But canonizing any tool is dangerous. dbt’s limitations are real, and the teams that acknowledge them build better systems than the teams that treat dbt as the answer to every data engineering question. The next evolution will build on what dbt got right while addressing what it got wrong. That is how tools should mature. Not through loyalty, but through honest assessment.

What did dbt get right about data engineering?

Where does dbt struggle as a transformation paradigm?

What did dbt get wrong about the data engineering workflow?

What does the next evolution of data transformation look like?

More Essays

The Unstructured Data Problem Nobody Wants to Solve

Your Data Catalog Is Lying to You

Your Data Catalog Is Lying to You