Lakatos and the Research Program of Machine Learning

Imre Lakatos developed the methodology of scientific research programs as a refinement of both Kuhn and Popper, distinguishing between “progressive” programs (generating novel predictions confirmed by evidence) and “degenerative” programs (producing only post-hoc explanations). Machine learning, a field that received $77 billion in private investment in 2024 alone, exhibits both progressive and degenerative characteristics simultaneously. Lakatos gives us the tools to ask the question the industry avoids: is scaling still generating genuine progress, or are we defending a program that has begun to degenerate?

What does Lakatos’s framework reveal about the state of machine learning?

Lakatos’s framework shows that ML has a progressive core (genuine predictive capabilities) surrounded by an increasingly degenerative protective belt (scaling assumptions, benchmark gaming, and unfalsifiable claims about “emergent” capabilities). The question is whether the core remains progressive enough to justify the protective belt.

Lakatos’s methodology of scientific research programs distinguishes between a “hard core” of non-negotiable assumptions and a “protective belt” of auxiliary hypotheses that can be modified to accommodate anomalies. A program is progressive when modifications to the belt predict new phenomena. It is degenerative when modifications only explain away existing failures.

The hard core of the ML research program contains a powerful claim: that sufficiently complex functions, fitted to sufficiently large datasets, can approximate patterns in the world well enough to be useful for prediction and generation. This claim has been spectacularly confirmed. Language models generate coherent text. Image models produce photorealistic images. Recommendation systems predict behavior with measurable accuracy.

The protective belt, however, shows signs of degeneration. When a model fails to generalize, the response is “we need more data.” When a model hallucinates, the response is “we need more parameters.” When a model produces harmful output, the response is “we need more RLHF.” Each of these is a modification to the protective belt that does not generate novel predictions. It explains away failures while preserving the core assumption that scaling solves problems. I have seen this pattern in LLM evaluation work: the benchmarks improve, but the improvements do not always correspond to improvements in the capability we actually care about.

How do you distinguish progressive from degenerative phases?

A progressive phase generates novel predictions that are subsequently confirmed. A degenerative phase generates only explanations of existing results. The test: has the field recently predicted something new that was then verified, or has it only explained why its latest failures do not matter?

Lakatos offered a precise criterion. I apply it quarterly to ML projects I evaluate. In the last year, I have seen 3 genuinely progressive results: a protein structure prediction that was subsequently verified experimentally, a materials science model that identified a compound later confirmed in the lab, and a code generation model that consistently produced correct implementations for a new class of problems it had not been specifically trained on. These are Lakatosian progress: the research program predicted something new, and the prediction was confirmed.

I have also seen 12 degenerative modifications: models that were defended with “the benchmark is wrong,” improvements explained only after they were observed, and capabilities claimed as “emergent” without a prior prediction that they would emerge. Each of these is a protective belt adjustment that preserves the hard core without generating new knowledge. According to Lakatos’s framework, a program that produces more degenerative than progressive modifications is in trouble, regardless of how much investment it attracts.

What does this mean for engineering decisions about ML adoption?

It means distinguishing between the progressive core of ML (genuine predictive capabilities for specific, well-defined tasks) and the degenerative protective belt (claims that scaling will solve problems it has not yet solved). Invest in the core. Be skeptical of the belt.

Demand falsifiable claims: Before investing in an ML approach, ask: “What would have to be true for this to fail?” If no one can answer, the approach is unfalsifiable, and unfalsifiable approaches are, by Lakatos’s criterion, degenerative.
Track novel predictions: Keep a log of what the model was predicted to do and what it actually did. If the model only succeeds at tasks it was designed for and fails at novel tasks, the protective belt is doing the work, not the hard core.
Distinguish capability from scaling: A model that solves a problem with 10 billion parameters that could not be solved with 1 billion parameters represents genuine progress. A model that solves the same problem 5% better with 10x the parameters represents scaling, not progress.

This connects to the broader epistemological framework I use in evaluating autonomous agents. The question is not whether agents work. The question is whether we have principled criteria for determining when they work and when they do not.

Is machine learning science or engineering?

It is both, and this dual identity is the source of confusion. As science, ML is a research program that must be evaluated by Lakatos’s criteria. As engineering, it is a set of tools that must be evaluated by practical outcomes. The problem arises when engineering success (it works on this dataset) is mistaken for scientific progress (we understand why it works).

I have built ML systems that worked beautifully in production while I understood very little about why they worked. This is engineering success. It is not scientific understanding. Lakatos would say the engineering success belongs to the progressive core of the program. The lack of understanding suggests the theoretical framework is still degenerative. Both can be true simultaneously, and both should inform how we invest.

“The methodology of scientific research programs is the logic of scientific progress.” — Imre Lakatos

The $77 billion invested in ML in 2024 is a bet on a research program. Lakatos gives us the tools to evaluate whether that bet is justified. The progressive core of ML is real and valuable. The degenerative protective belt, the claim that more data and more parameters will eventually solve every problem, deserves more scrutiny than it receives. The engineer who can distinguish between the two will make better investment decisions than the one who treats the entire program as either salvation or hype.

What does Lakatos’s framework reveal about the state of machine learning?

How do you distinguish progressive from degenerative phases?

What does this mean for engineering decisions about ML adoption?

Is machine learning science or engineering?

More Essays

Epistemology of Metrics: What We Measure and Know

Trauma, resilience, and systems: A psychological framework for building antifragile habits

The Absurdity of Optimizing Deprecated Systems