🌱 Seedling

When to Stop Measuring: Diminishing Returns of Data Collection

· 2 min read
After reducing a data collection footprint from 847 tracked events to 312, one product team found that analytical output quality increased by 25% (measured by stakeholder satisfaction scores) while storage and processing costs dropped by 38%. Collecting less data with more intention produced better outcomes than comprehensive data hoarding.

When does more data collection actually reduce analytical value?

More data collection reduces analytical value when the noise-to-signal ratio increases, when analysts spend more time navigating irrelevant data than analyzing relevant data, and when the cognitive cost of understanding a complex data model exceeds the benefit of any individual data point.

I worked with a team that tracked 847 distinct events in their product analytics. When I asked which events were used in regular analysis, the answer was 89. When I asked which events had been queried at all in the last 6 months, the answer was 198. The remaining 649 events were collected “just in case.” That “just in case” data was not free. It consumed storage, complicated the event schema, slowed query performance, and made the analytics environment harder to navigate.

The via negativa approach applies directly. Subtraction improves systems. The Stoic principle of focusing only on what is within your control and necessary for your purpose translates into data strategy as: collect what you need, discard what you do not, and resist the temptation to hoard against imagined future needs.

How do you decide what to stop measuring?

Apply three criteria: does anyone query this data regularly, does this data inform a specific decision, and would anyone notice if this data disappeared tomorrow. If the answer to all three is no, stop collecting it.

I ran this test on 649 events. For each, I checked query logs (has this been queried in 6 months?), decision maps (does this inform a documented decision?), and stakeholder interviews (does anyone rely on this?). The results: 535 events failed all three tests. I removed them over 4 weeks. Zero complaints. Zero missed analyses. According to data minimization principles, collecting only what is needed is not just ethically sound; it produces operationally better systems.

The impulse to measure everything comes from a reasonable fear: what if we need this data later and do not have it? But that fear has a cost. Every data point collected is a data point maintained, stored, secured, and navigated around. At some point, the cost of keeping data exceeds the potential value of having it. Finding that point, and having the discipline to stop before it, is the mark of a mature data organization. The question is not “can we measure this?” It is “should we?”