Filtering by Tag: #LLM

Fable 5 as Advisor: Anthropic’s Two-Model Pattern for Smarter, Cheaper Agents

Added on July 20, 2026 by Jon Krohn.

Want near-frontier A.I. agent quality at a fraction of the cost? Anthropic recently productized the Advisor Strategy that pairs a cheap "executor" model with a brilliant "advisor" to give you the best of both worlds:

HOW IT WORKS
• A fast, cheap model (e.g., Claude Haiku or Sonnet) runs the entire agent loop: calling tools, writing code, drafting output.
• A frontier model (e.g., Claude Opus or Fable) sits on standby as a "tool" the executor can consult (like a junior worker phoning their supervisor when unsure).
• Everything happens inside one API call: Anthropic's servers hand the advisor the full conversation transcript and return just 400-700 tokens of advice, making this fast and inexpensive (it's also usually only a one-line code change so it's easy to implement).

THE RESULTS
• Sonnet + Opus advisor beat Sonnet alone on the "SWE-bench Multilingual" benchmark by 2.7 percentage points while cutting cost per task by 11.9%. Better quality AND slightly lower cost.
• Unsurprisingly, the biggest gains come from pairing a very fast/cheap model with a much more capable advisor: For example, on BrowseComp (web research benchmark), Haiku alone scored 19.7%; Haiku + Opus advisor scored 41.2% (more than double!) at 85% less cost than Sonnet alone.
• Newest data, from last week: On "SWE-bench Pro", Sonnet 5 + a Fable 5 advisor captured ~92% of Fable's standalone performance at ~63% of its cost.

WHY IT WORKS
• The advisor's output is tiny relative to the whole task, and a good plan delivered early prevents wasted attempts and misguided tool calls.
• Unlike OpenAI's router (which dispatches queries to a model up front), the cheap model runs the show and escalates itself mid-task with full shared context.

PRACTICAL LESSONS
• Skip it for single-turn Q&A; it shines on long-horizon agentic work (like coding, research, computer use).
• Executors under-call the advisor by default so prompt them to consult it early (before committing to an approach) and late (before declaring the task done).
• Cap advisor output at ~2,000 tokens (~7x cost reduction, no quality loss) and enable prompt caching for long loops.
• The pattern is spreading: OpenRouter now offers a cross-provider version (e.g., a Google Gemini executor consulting Claude).
• Alternative design patterns such as having a powerful "orchestrator" (shown below the advisor pattern in the chart I included in this post) might work even more effectively for your use case so it could be worth comparing them.

BOTTOM LINE
Frontier A.I. progress is no longer just bigger models... it's smarter economics in composing the models we already have.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

What’s Left to Build When Software Is Free, with Chip Huyen

Added on June 10, 2026 by Jon Krohn.

For today's landmark episode (#999!), I asked rockstar Chip Huyen to be my guest and she said "yes"! We discuss her book "A.I. Engineering" (the most popular O'Reilly book in 2025) and how the A.I. job landscape is shifting.

In case you haven't heard of her, more on Chip:
• Her most recent book is "AI Engineering", which was the most popular book in the O'Reilly platform last year.
• Previously wrote “Designing Machine Learning Systems”, which was also an O'Reilly mega-bestseller and was based on the Stanford University course she created and taught on the same topic.
• Is currently building a new stealth startup.
• Previously worked as VP of AI at Voltron Data, co-founder of Claypot AI, ML Engineer at Snorkel AI and Sr Deep Learning Engineer at NVIDIA.
• Holds a Master's in Computer Science from Stanford.
• Her invaluable posts have earned her over 300k followers on LinkedIn.

In this episode, Chip breaks down:
• What separates AI engineering from machine learning engineering.
• The case for a "start simple" workflow.
• The real costs of running LLMs in production.
• Physical AI.
• Robotics.
• World models.
• Why the durable problems worth solving are increasingly human ones.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

End-to-End Foundation Models for the Energy Industry, with Jazmia Henry

Added on May 28, 2026 by Jon Krohn.

What does it take to build foundation LLMs from scratch today? Deeply impressive Jazmia Henry breaks down the four stages in today's episode, enjoy!

Jazmia:
• Holds degrees from Tulane University and Columbia University... and is partway through a PhD at the University of Oxford.
• Held a technical fellowship at Stanford University.
• Previously worked as a data strategist at Morgan Stanley, head of ML at The Motley Fool and a Lead Applied AI engineer at Microsoft.
• Published a top paper at NeurIPS, the world's most prestigious academic AI conference.
• Currently works as "Member of Technical Staff for AI/ML" at collide., a Texas-based startup that’s building AI infrastructure (including all aspects of specialized foundation models) for the energy industry.

Key topics covered in this episode include:
• What foundation models are.
• Her "full-stack" foundation-model building's four distinct stages.
• How reinforcement learning (RL) models are "bursty" because they idle the GPU during reward calculation and then dump enormous loads on it all at once.
• Reward hacking by RL models.

Thanks to Mark Freeman II for recommending Jazmia as a guest.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

How to Build AI-First Organizations, with Jacob Miller and Jeremy Mumford

Added on May 20, 2026 by Jon Krohn.

After today's fun episode with Jacob and Jeremy — authors of the brand-new book "Architected Intelligence" — you’ll have all the key info to build successful AI features, AI products and AI-first companies. Enjoy!

Jeremy Mumford and Jacob Miller serve as Lead AI Engineer and Vice President of Platform Intelligence, respectively, at Pattern, a giant Utah-based tech company that IPO’ed on the Nasdaq exchange about six months ago.

Jacob and Jeremy's brand-new "Architected Intelligence" book was published by Wiley and this episode focuses almost exclusively on this invaluable book.

Episode highlights include:
• The "User Agnosticism Tenet", which means designing products and processes so they can be executed equally well by a human, an AI agent, or any hybrid combo.
• The shift in the "define-build-feedback" loop today where "building" is no longer the bottleneck, which means "definition" and "feedback" are where teams win or lose.
• Why workflows are deterministic, predictable, and cheaper than agents, and why the natural progression is skills first, then workflows, and only then agents.
• Why data engineering is the bedrock of AI engineering.
• Why velocity is the only durable moat in a world where everyone has access to the same frontier models.

Thanks to podcast superfan Jonathan Bown for recommending Jeremy and Jacob as guests!

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Tokenmaxxing vs AI Hardware Bottlenecks

Added on May 19, 2026 by Jon Krohn.

Humans (like Reinforcement Learning algos) can "reward hack": "Tokenmaxxing" being a perfect example, after employers started using "number of tokens" consumed as a proxy for developers' productivity.

Even if humans weren't engaging in this pointless time-, money- and energy-consuming behavior, however, demand for A.I. compute is so vast that everyone's scrambling to to make more available. Alas, four tricky hardware bottlenecks face us:

1. GPUs:
• NVIDIA data-center GPU lead times now run 36–52 weeks, with Blackwell chips sold out through mid-2026.
• The real choke point isn't fabrication: It's TSMC's "CoWoS" advanced packaging, which is sold out through 2026. Nvidia alone has locked up ~60% of CoWoS capacity through 2027.

2. High-Bandwidth Memory (HBM):
• Demand has quintupled since 2023, and only three companies (SK hynix, Samsung and Micron) make it.
• All three are sold out well into 2026 and new HBM factories take 18–24 months to come online.

3. CPUs:
• As workloads shift toward agentic AI, the CPU:GPU ratio jumps from ~1:12 (for GenAI-only chatbots) to 1:1.
• Intel's CFO says the server-CPU shortfall "starts with a B" — billions in unmet demand so server CPU prices are up 10–20% in just the past couple of months.

4. Electricity: Hyperscaler build-outs are now gated by grid interconnect (18–36 months) and transformer lead times.

THE BIG MISMATCH
• The top 5 hyperscalers alone (Alphabet, Amazon, Meta, Microsoft and Oracle) are on track for ~$725B in combined 2026 capex.
• That's roughly 6x the hyperscalers' 2022 spend, with ~75% going to A.I. infrastructure.
• Hardware suppliers, however, have grown capex by only ~50%.... a 6x increase in demand met by only a 50% increase in supply is a big mismatch!

REASONS FOR OPTIMISM
Demand will continue to be high but I'm optimistic we'll continue to squeeze more juice from every lemon because, e.g.:
• Algorithmic efficiency keeps improving — Google's TurboQuant recently briefly tanked memory stocks by promising to materially cut inference memory needs.
• LLM efficiency gains via mixture-of-experts and smarter inference scheduling continue to compound.
• The tokenmaxxing trend is a corporate farce that will fade.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

The Four Types of Memory Every AI Agent Needs, with Richmond Alake

Added on April 22, 2026 by Jon Krohn.

To build an effective A.I. agent, getting its memory right is essential. In today's episode, our agent-memory guide is brilliant (and very funny!) machine-learning architect and engineer, Richmond Alake.

More on Richmond:
• Director of A.I. developer experience at Oracle.
• Previously roles include: staff developer advocate for AI/ML at MongoDB, ML architect at Slalom, writer for NVIDIA and computer-vision engineer at Loveshark.
• Holds a master's in ML and robotics from the University of Surrey.

In this episode, Richmond magnificently covers:
• How agent memory is the encapsulation of systems (embedding models, rerankers, databases, and LLMs) that allow AI agents to learn and adapt with new information over time, rather than starting from scratch every session.
• The four types of agent memory (all drawn from human cognition).
• Memory-first agent harnesses.
• Predictions for a flattening of AI engineering roles, where the future developer will need end-to-end understanding of the full agent stack.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.