AI Philosophy

AI Safety, Interpretability, and Societal Risk: What We Do Not Yet Understand

· book · finished

AI systems are deploying faster than our ability to understand or govern them. These three pieces map the gap between capability and accountability.

Mechanistic Interpretability: Breakthrough Technologies 2026

TLDR: Anthropic, OpenAI, and DeepMind are building tools to trace prompt-to-response paths inside models. The goal is moving from black-box testing to genuine understanding of how models arrive at outputs.

Key Insight: Mapping internal reasoning traces is the critical frontier for AI safety.

Read the full article ->

AI for Everything: Breakthrough Technologies 2024

TLDR: Generative AI reached consumers faster than almost any prior technology. This piece reflects on the societal implications of that speed — implications we have not yet fully reckoned with.

Key Insight: Speed of deployment without governance creates compounding problems.

Read the full article ->

AGI Will Not Happen in Your Lifetime. Or Will It?

TLDR: Gary Marcus and Grady Booch debate AGI timelines and conclude that large language models alone are insufficient. The core obstacle is architectural — integrating many individual capabilities into a coherent whole remains unsolved.

Key Insight: Evaluate AI on demonstrated capabilities, not promissory narratives about superintelligence.

Read the full article ->

What does this mean for how we think about AI?

The gap between deployment speed and interpretability is where real risk lives. Mechanistic interpretability offers a path toward accountability, but only if we resist the AGI hype cycle long enough to fund the slower, harder work of actually understanding these systems.