Semantic Layers Are the Missing Piece in Most Data Architectures
What is the metric fragmentation problem?
The metric fragmentation problem occurs when the same business concept (revenue, churn, active users) has different definitions in different dashboards, reports, and queries, creating contradictory numbers that erode trust and waste time on reconciliation.
I audited a company’s metric definitions. “Monthly Active Users” had 5 different definitions across 5 dashboards: one counted unique logins, another counted users with at least one action, a third counted users active in the last 28 days (not 30), a fourth excluded internal users while the others included them, and the fifth used a different timezone for the month boundary. Five dashboards, five numbers, five stakeholders each believing their number was correct. The problem was not that anyone was wrong. Each definition was defensible. The problem was that there was no single authoritative definition.
How do semantic layers solve this?
Semantic layers solve metric fragmentation by centralizing metric definitions in a single, version-controlled layer that all consumption tools query through, ensuring that “revenue” computed in a Tableau dashboard matches “revenue” computed in a Python notebook matches “revenue” returned by an API.
I implemented a semantic layer using Cube (formerly Cube.js) for a 200-person company. The implementation involved:
- Metric definitions in code: Each metric was defined once in a YAML file with its SQL calculation, dimensions it could be sliced by, and documentation. “Monthly Active Users” got one definition: users with at least one qualifying action in a calendar month, excluding internal accounts, using UTC timezone. All 5 dashboard teams were consulted. The resulting definition was a negotiated agreement, not an imposed standard
- Consumption-agnostic access: The semantic layer exposes metrics through a SQL API, a REST API, and direct integrations with Tableau, Metabase, and Looker. Every tool queries the same metric definition. According to semantic layer architecture, this abstraction decouples metric definition from metric consumption, enabling consistency without limiting tool choice
- Governance enforcement: Dashboards must query through the semantic layer for any metric that has a canonical definition. Direct SQL queries that reimplement a defined metric are flagged during code review. This enforcement is social (code review norms) rather than technical (no hard block), but it works because the semantic layer is easier to use than reimplementing the calculation. The governance as code approach integrates with the semantic layer’s version-controlled definitions
Why do most organizations need a semantic layer even if they do not realize it?
Any organization with more than 3 dashboards, more than 1 analyst, or more than 1 BI tool will eventually develop metric fragmentation, and the cost of reconciling inconsistent numbers grows with every new dashboard and every new analyst.
The cost of not having a semantic layer is measured in reconciliation time. I tracked it: before the semantic layer, analysts spent an average of 4 hours per week reconciling metric discrepancies across dashboards. After implementation, that dropped to under 30 minutes. Across a 6-person analytics team, that is 21 hours per week reclaimed, roughly $100,000 per year in analyst time redirected from reconciliation to actual analysis.
The dashboard design discipline is incomplete without a semantic layer. You can design beautiful, focused dashboards, but if the metrics in those dashboards are calculated differently than the same metrics in a different dashboard, the beauty is misleading.
What are the practical considerations for implementation?
Implementation requires negotiating metric definitions across teams (the hardest part), choosing a semantic layer tool that integrates with your existing stack, and establishing governance norms that make the semantic layer the path of least resistance rather than an additional bureaucratic step.
The metric negotiation process took longer than the technical implementation. Each metric required agreement from every team that used it. “Revenue” took 3 meetings to define because finance, sales, and product each had legitimate reasons for their existing definitions. The final definition was a compromise with clearly documented variants: “Revenue (GAAP)” for finance, “Revenue (Bookings)” for sales, “Revenue (Product)” for product. Each is a different metric with a different definition, not the same metric with conflicting definitions. That distinction, multiple defined metrics versus one undefined metric, is the core value of a semantic layer.
Semantic layers are infrastructure, not luxury. Every organization that has experienced the “which number is right?” problem in a leadership meeting has experienced the absence of a semantic layer. The technology to solve it exists. The organizational will to negotiate shared definitions is the harder requirement. But the alternative, continued fragmentation, continued reconciliation, continued trust erosion, is more expensive than the negotiation, every time.