🌱 Seedling

The Lakehouse Convergence: End of Storage Ideology

· 3 min read
Open table formats, principally Apache Iceberg and Delta Lake, dissolved the data warehouse vs. data lakehouse debate by making the storage layer interoperable. As of early 2026, 3 of the 4 major cloud data platforms support Iceberg natively, and the remaining strategic question is not “warehouse or lakehouse?” but “how do I avoid format lock-in while the interoperability standard settles?”

Why did the warehouse vs. lakehouse debate exist?

The warehouse vs. lakehouse debate existed because storage formats were proprietary, meaning choosing a platform meant choosing a format, and choosing a format meant accepting vendor lock-in to whatever query and governance capabilities that vendor provided.

Snowflake stored data in a proprietary micro-partitioned format. You could load data in, but extracting it at scale required Snowflake compute. Databricks built Delta Lake as an open format, but early versions had interoperability limitations. The “lakehouse” concept (structured warehouse capabilities on open lake storage) was genuinely innovative, but it was also a competitive positioning move: Databricks needed a narrative that differentiated from Snowflake’s warehouse dominance.

I spent 6 months in 2023 helping an organization choose between Snowflake and Databricks. The technical evaluation produced no clear winner. The real decision was about storage philosophy: proprietary-but-optimized or open-but-evolving. That was an architectural religion question, not an engineering one.

How did open table formats dissolve the debate?

Apache Iceberg’s emergence as a cross-platform standard eliminated the storage lock-in that drove the debate, because data stored in Iceberg can be queried by Snowflake, Databricks, BigQuery, Trino, and Spark without migration or reformatting.

The turning point was Snowflake’s announcement of native Iceberg table support, followed by BigQuery’s Iceberg integration and Databricks’ adoption of Iceberg alongside Delta Lake through UniForm. When every major platform reads the same format, the storage layer is no longer a differentiator. Competition moves to compute performance, developer experience, and governance tooling, all of which are substitutable without data migration.

I migrated a 14TB analytical dataset from Snowflake’s proprietary format to Iceberg tables in early 2025. The migration took 3 days. The dataset is now queryable from Snowflake (for BI workloads), Databricks (for ML workloads), and Trino (for ad-hoc analysis). Three compute engines, one copy of the data, zero vendor lock-in at the storage layer. This configuration was impossible 2 years earlier.

What is the remaining question after format convergence?

The remaining question is whether interoperability standards (Iceberg REST catalog, credential vending, cross-engine transaction support) will mature fast enough to make multi-engine architectures practical, or whether operational complexity will push teams back to single-vendor simplicity.

Format compatibility is necessary but not sufficient. Querying an Iceberg table from Databricks while Snowflake holds an active write lock on it requires catalog-level coordination that is still evolving. Permission models differ across engines. Metadata caching strategies conflict. The format is open. The operational layer above it is not yet standardized.

I see two possible futures. In one, the Iceberg REST catalog becomes a universal coordination point, and multi-engine architectures become routine. In the other, the operational complexity of multi-engine pushes most organizations toward a single primary engine with Iceberg as an escape hatch rather than a daily interoperability layer. The data is the same in both futures. The operational model differs.

The warehouse vs. lakehouse debate is over. Open table formats ended it by making the question irrelevant. But debates in technology are never truly settled; they transform. The new debate, already forming in conference talks and vendor keynotes, is about catalog interoperability, governance portability, and compute-storage separation economics. The storage ideology died. The operational ideology is just beginning. The wise approach is the same as always: couple to the open standard (Iceberg), stay loosely coupled to the proprietary layer above it, and avoid treating any vendor’s current architecture as permanent.