Goodhart’s Law in Your Dashboard: When Metrics Fail
What is Goodhart’s Law, and why does it haunt every dashboard?
Goodhart’s Law states that when a measure becomes a target, it ceases to be a good measure, because the act of optimizing for the metric creates incentives to improve the number rather than the underlying reality.
The law is deceptively simple and catastrophically common. I have watched it corrupt metrics in every organization I have worked with. The mechanism is always the same: a number is chosen to represent a complex reality, the number is made visible on a dashboard, someone ties the number to evaluation or funding, and then rational actors begin optimizing the number instead of the reality. The dashboard turns green. The underlying system rots.
Charles Goodhart articulated this in 1975 while studying British monetary policy, but the philosopher of science Campbell identified the same pattern independently. Campbell’s version is more precise: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” Every data engineer should have this sentence printed and taped to their monitor.
How does metric distortion actually work in engineering organizations?
Metric distortion follows a predictable 3-phase pattern: adoption (the metric reflects reality), optimization (teams find efficient ways to improve the number), and corruption (the number and reality diverge permanently).
I documented this pattern in detail with engineering velocity tracking. An organization I consulted with in 2024 adopted story points as their velocity metric. Phase 1 (months 1 through 3): teams estimated honestly, velocity numbers reflected actual throughput, and the metric was useful for sprint planning. Phase 2 (months 4 through 7): teams noticed that higher velocity numbers correlated with positive performance reviews, so they began decomposing work into smaller stories (inflating point counts without changing output) and gravitating toward predictable tasks over uncertain ones. Phase 3 (months 8 through 12): velocity increased 47% while customer-facing feature delivery, measured by released capabilities, decreased 12%.
The velocity number was not wrong. It accurately counted what it measured: completed story points per sprint. But what it measured had decoupled from what it was supposed to represent: team productivity and delivery capacity. The map and the territory diverged, and the organization was navigating by the map.
I found the same pattern in data pipeline SLA metrics. A team defined “pipeline health” as the percentage of DAGs completing within their SLA window. When this became a KPI, the team responded rationally: they widened SLA windows. A pipeline that originally had a 30-minute SLA was reclassified with a 4-hour window. Pipeline health went from 87% to 99.2%. The dashboard was pristine. The actual latency of data delivery to consumers increased by 40 minutes on average, invisible to the metric.
What does philosophy of measurement teach us about this problem?
Philosophy of measurement reveals that every metric embeds a theory about what matters, and when that theory is wrong or incomplete, the metric creates a systematic blind spot that optimization amplifies.
The philosopher Hasok Chang argues that measurement is not the passive recording of an objective quantity but an active intervention that shapes what we can perceive. To measure something is to commit to a theory about which aspect of reality is relevant. When you choose to measure engineering velocity in story points, you commit to a theory that story points are a meaningful proxy for productive work. That theory may be locally true and globally false.
The Stoics understood this problem in the context of virtue. Epictetus distinguished between things that are “up to us” (our judgments, choices, intentions) and things that are not (outcomes, reputations, external results). Metrics measure outcomes. Goodhart’s Law corrupts when we treat outcome metrics as if they were measuring intention and effort. A team’s velocity score is an outcome. The quality of their engineering judgment, the depth of their problem analysis, the integrity of their technical decisions, these are not captured by any number on any dashboard.
This is not an argument against measurement. It is an argument for epistemological humility about what measurements can and cannot tell us. Every metric on every dashboard is a compression of reality, and every compression loses information. The question is whether the lost information matters.
How do you design metrics that resist Goodhart corruption?
Goodhart-resistant metrics combine leading indicators with lagging outcomes, use paired metrics that expose gaming, and rotate measurement approaches to prevent optimization lock-in.
- Paired metrics: For every efficiency metric, pair it with a quality metric that would degrade if the efficiency metric were gamed. I paired pipeline SLA compliance (efficiency) with downstream query freshness (quality). When someone widened an SLA window, the freshness metric exposed the degradation
- Consumer-defined metrics: Let the data consumer define whether the metric reflects their experience, not the producer. When I asked analysts “Is data arriving when you need it?”, 4 of 7 said no, despite the SLA dashboard showing 99% compliance
- Metric rotation: Change what you measure every 2 to 3 quarters. Not the underlying goals, but the specific metrics used to track them. This prevents optimization lock-in. One quarter, measure pipeline latency. Next quarter, measure data consumer satisfaction. The goal (reliable data delivery) stays constant while the measurement surface shifts
- Qualitative checkpoints: Every quantitative dashboard review should include 15 minutes of unstructured conversation about what the numbers are not showing. I instituted “metric skepticism” sessions where the team’s explicit task was to identify what reality the dashboard might be hiding
When should you distrust your own dashboard?
Distrust a dashboard when all metrics trend positive simultaneously, when metric improvement coincides with increased informal complaints, or when the team spends more time discussing the metrics than the underlying work.
I developed 3 diagnostic signals for Goodhart corruption. First, the “too green” signal: when every metric on a dashboard is in the healthy range for more than 8 consecutive weeks, something is being gamed or the thresholds are too permissive. Real systems experience variance. Sustained perfection is a statistical improbability that should trigger investigation, not celebration.
Second, the “hallway test.” When dashboard metrics are positive but conversations in the hallway (or Slack channels) carry complaints about the same domain the metrics cover, the metrics are measuring the wrong thing. I discovered a data quality crisis at one organization not through any alerting system but through a Slack message that said, “Does anyone else’s revenue number look weird today?” The data quality dashboard had no alerts. The metric definitions had excluded the failure mode that was occurring.
Third, the “meeting proportion” signal. When more than 30% of a team’s meeting time is spent discussing metric performance rather than the actual work the metrics are supposed to represent, the metrics have become the product. The map has replaced the territory.
Goodhart’s Law is not a flaw in specific metrics. It is a feature of the relationship between measurement and human behavior. Every dashboard is a lens, and every lens distorts. The discipline is not to build perfect metrics (those do not exist) but to maintain a continuous, skeptical awareness of the gap between what your numbers say and what your systems do. The Stoics called this prosoche, the practice of attention. In data engineering, it is the practice of never fully trusting your own instruments.