Sara Hooker’s essay “On the Slow Death of Scaling” (2026) is a thought-provoking piece that deserves a careful read.1 What follows is my interpretation (or perhaps more accurately, a riff) on her arguments, where I’ll both steelman her position and push back where I think the evidence points elsewhere. Hooker,...
[Read More]
Active Learning vs. Data Filtering:
Selection vs. Rejection
What is the difference between active learning (and active sampling)
and data filtering? And why do we treat data selection differently
during training versus before training?
[Read More]
The Paradox of Polarization: When More Facts Backfire
This note explores a seemingly simple yet surprisingly profound example of how rational agents can diverge in their beliefs even when exposed to identical evidence. While the mathematical model we’ll examine is highly simplified, its core mechanism offers a potential lens through which to understand the complex dynamics of real-world...
[Read More]
Why is the Bayesian Model Average the best choice?
Why is the Bayesian model average (BMA) often hailed as the optimal
choice for rational actors making predictions under uncertainty? Is this
claim justified, and if so, what’s the underlying logic?
[Read More]
Function-Space Variational Inference and Label Entropy Regularization (#2)
In the first part of this two-part series on Function-Space Variational Inference (FSVI), we looked at the Data Processing Inequality (DPI). In this second part, we finally look at the relationship between FSVI, a method focusing on the Bayesian predictive posterior rather than the parameter space, and the DPI. We...
[Read More]