Ethics of AI Cost Optimization: Cheaper Models, Worse Outcomes

Switching from GPT-4 to a smaller, cheaper model for a benefits eligibility system saved $14,000 per month in inference costs. It also degraded accuracy for non-English speakers by 11 percentage points, a disparity that was not flagged because the evaluation dataset underrepresented multilingual users by a factor of 8.

When does cost optimization become an ethical decision?

Model selection decisions become ethical decisions when performance degradation from cost optimization disproportionately affects already underserved populations, and the people making the decision do not measure the disparity because their evaluation data does not represent those populations.

The benefits eligibility system processed 15,000 applications per month. The switch to a cheaper model reduced per-query costs from $0.12 to $0.03. The overall accuracy difference was 2.1 percentage points, well within the team’s acceptable range. But when I disaggregated the results by language, the accuracy gap for Spanish-language applications was 11 percentage points. For Vietnamese-language applications, it was 14 percentage points.

The team had not measured disaggregated performance because their evaluation dataset was 94% English. The cost optimization decision was made on aggregate metrics that masked a significant ethical disparity. This is not a hypothetical scenario. It happened in a system that determined whether real people received real benefits. The $14,000 monthly savings came at the expense of accuracy for the populations most likely to need the benefits. As I have written about in the context of FinOps for AI systems, cost optimization without equity analysis is incomplete accounting.

The ethical question is not whether cost optimization is acceptable. It is whether the people who bear the cost of cheaper models are the same people who benefit from the savings. In this case, they were not. The organization saved money. Non-English speakers received worse outcomes. That tradeoff deserved explicit acknowledgment and informed decision-making, not a spreadsheet comparison of average accuracy rates.

I do not have a formula for when cost optimization crosses an ethical line. I have a principle: disaggregate performance metrics by every demographic dimension your system affects before making model selection decisions. If the cheaper model degrades performance uniformly, the tradeoff is straightforward. If it degrades performance for specific populations, especially populations the system was built to serve, the decision requires ethical deliberation, not just financial analysis.