The Demarcation Problem in Data Science: When Is It Science

Karl Popper’s demarcation criterion holds that a discipline qualifies as science only if its claims are falsifiable. Data science, a field employing an estimated 320,000 practitioners in the United States alone (Bureau of Labor Statistics, 2024), often operates without falsifiable hypotheses, replicable experiments, or theoretical grounding. When 68% of data science models deployed to production are never evaluated against their original predictions (2024 MLOps Community Survey), the field faces a fundamental question: is this science, or is it something else wearing science’s clothes?

When does data science qualify as actual science?

Data science qualifies as science when it formulates falsifiable hypotheses, tests them against data, and revises its models based on the results. When it skips these steps, which is most of the time, it is pattern recognition or engineering, not science.

The demarcation problem in the philosophy of science asks what distinguishes science from non-science. Popper’s answer was falsifiability: a claim is scientific if and only if it can, in principle, be shown to be false by observation. Claims that accommodate any possible observation are not scientific but metaphysical.

I worked on a customer churn prediction model. The model predicted that 18% of customers in a specific segment would churn in Q3. When the actual churn rate was 23%, the team explained the discrepancy by citing “external market factors.” When I asked what churn rate would have falsified the model, no one could answer. If no possible outcome would have caused us to reject the model, we were not doing science. We were doing storytelling with numbers.

This is Popper’s criterion applied with uncomfortable precision. A model that can explain any outcome predicts nothing. It is what Popper called “unfalsifiable” and what I call “decorative analytics.” It looks scientific. It uses statistical methods. It produces numbers. But it does not generate knowledge in the philosophical sense, because it does not expose itself to the possibility of being wrong.

What is most data science actually doing?

Most data science is inductive pattern recognition (finding patterns in historical data) presented as deductive science (making claims about the future). The confusion between the two is not just philosophical. It leads to overconfidence in predictions and underinvestment in validation.

I distinguish between 3 activities that all travel under the banner of “data science”:

Descriptive analytics: “What happened?” This is reporting, not science. It is valuable, but it makes no claims that can be tested.
Predictive modeling: “What will happen?” This can be science, if the predictions are falsifiable and the model is evaluated against them. In practice, I estimate that fewer than 30% of predictive models I encounter include pre-registered predictions against which the model is later evaluated.
Causal inference: “Why did it happen?” This is the closest to science, because causal claims are inherently falsifiable. But it requires experimental design (A/B testing, natural experiments, instrumental variables) that most data science teams do not practice.

The problem is not that descriptive analytics is worthless. It is that calling it “science” sets false expectations. A dashboard that describes what happened is useful. A dashboard presented as scientific understanding of why it happened is misleading.

How can data practitioners apply scientific rigor?

By pre-registering predictions, designing falsification criteria before analysis, and honestly categorizing their work as description, prediction, or causal inference rather than lumping everything under “data science.”

Pre-register predictions: Before deploying a model, write down what it predicts for the next quarter. Evaluate the model against those predictions. If it was wrong, understand why. If it was right, ask whether it was right for the right reasons.
Define falsification criteria: For every model, answer: “What outcome would cause us to abandon or fundamentally revise this model?” If you cannot answer, the model is unfalsifiable.
Separate description from prediction: Be explicit about whether your analysis describes the past or predicts the future. These are different activities with different epistemological standards.

As I explored in Popper’s falsifiability and A/B testing, the rigor is not complex. It is uncomfortable. It requires admitting that your model might be wrong before you know whether it is right.

Does it matter whether we call it science?

Yes. Because the label “science” carries epistemic authority. When data analysis is called science, its conclusions receive more trust than they have earned. The label does not just describe the work. It shapes how the work is received.

According to Popper’s philosophy of science, the boundary between science and non-science is not a judgment of value. Astrology is not bad because it fails the demarcation criterion. It is simply not science. Similarly, descriptive analytics is not bad because it is not science. It is simply not science, and should not be trusted as if it were.

The 320,000 data science practitioners in the U.S. are doing work that ranges from rigorous causal inference (science) to Excel reporting with a Python veneer (not science). The field would benefit from honest categorization. The engineer who knows the difference between data quality and data truth is better equipped to use data science responsibly than the one who trusts every output because it came from a “data scientist.”

“A theory that explains everything, explains nothing.” — Karl Popper

The demarcation question is not an academic exercise. It determines how much trust your organization places in its models, how much investment those models justify, and how much authority data practitioners wield. Data science has earned some of that trust. But it has claimed more than it has earned. Popper’s simple criterion, can this be proven wrong?, would restore the honest relationship between analysis and certainty that the field needs. Not every number is knowledge. Not every model is science. And knowing the difference is itself a form of scientific rigor.

When does data science qualify as actual science?

What is most data science actually doing?

How can data practitioners apply scientific rigor?

Does it matter whether we call it science?

More Essays

Ethics of Attention in the Age of Notifications

Dichotomy of Control in Production Systems

The Case for Boring Automation