Frontier LLM results, on device – Day 1

Your users don’t really need a frontier model.

In this talk, you’ll get the real-world story of replacing expensive Claude Sonnet calls with an open-weight model running on the user’s laptop. You’ll learn how to use capability evals to find your app’s SAGE (Small And Good Enough) model, and how to use prompt engineering to close the gap with frontier models.

Expect honest numbers, a methodology you can apply on Monday, a few counterintuitive findings about how small models actually behave, and patterns for shipping local AI without making users wait through a 2GB download just to open your app.