Simon Willison’s year-end retrospectives are the most thorough independent accounts of how the LLM landscape actually shifted. Two years of reviews reveal the trajectory.

What did we learn about LLMs in 2024?

TLDR: Willison documented that 18 different organizations shipped models that beat the original GPT-4 benchmark during 2024. He identified Claude as his personal daily driver for writing and coding tasks. The year’s defining trend was the commoditization of what had been frontier capability just twelve months earlier.

Key Insight: The moat around GPT-4 level performance collapsed entirely in a single calendar year.

Read the full article →

What defined the LLM landscape in 2025?

TLDR: Willison’s 2025 review covered the rise of reasoning models, the maturation of coding agents, and the intensifying competition between frontier labs and open-source alternatives. The gap between the best proprietary and best open-source models continued to shrink, though frontier models maintained an edge on the hardest tasks.

Key Insight: 2025 was the year AI coding went from “interesting demo” to “daily production tool,” fundamentally changing how software gets built.

Read the full article →

What does this mean for your AI workflow?

Reading these two reviews back-to-back shows how fast the field moves and how quickly today’s frontier becomes tomorrow’s baseline. The practical takeaway is to re-evaluate your AI tool stack at least twice a year, because the model you dismissed six months ago may now be the best option for your specific work.