What does it take to build foundation LLMs from scratch today? Deeply impressive Jazmia Henry breaks down the four stages in today's episode, enjoy!
Jazmia:
• Holds degrees from Tulane University and Columbia University... and is partway through a PhD at the University of Oxford.
• Held a technical fellowship at Stanford University.
• Previously worked as a data strategist at Morgan Stanley, head of ML at The Motley Fool and a Lead Applied AI engineer at Microsoft.
• Published a top paper at NeurIPS, the world's most prestigious academic AI conference.
• Currently works as "Member of Technical Staff for AI/ML" at collide., a Texas-based startup that’s building AI infrastructure (including all aspects of specialized foundation models) for the energy industry.
Key topics covered in this episode include:
• What foundation models are.
• Her "full-stack" foundation-model building's four distinct stages.
• How reinforcement learning (RL) models are "bursty" because they idle the GPU during reward calculation and then dump enormous loads on it all at once.
• Reward hacking by RL models.
Thanks to Mark Freeman II for recommending Jazmia as a guest.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Tag: #LLM
How to Build AI-First Organizations, with Jacob Miller and Jeremy Mumford
After today's fun episode with Jacob and Jeremy — authors of the brand-new book "Architected Intelligence" — you’ll have all the key info to build successful AI features, AI products and AI-first companies. Enjoy!
Jeremy Mumford and Jacob Miller serve as Lead AI Engineer and Vice President of Platform Intelligence, respectively, at Pattern, a giant Utah-based tech company that IPO’ed on the Nasdaq exchange about six months ago.
Jacob and Jeremy's brand-new "Architected Intelligence" book was published by Wiley and this episode focuses almost exclusively on this invaluable book.
Episode highlights include:
• The "User Agnosticism Tenet", which means designing products and processes so they can be executed equally well by a human, an AI agent, or any hybrid combo.
• The shift in the "define-build-feedback" loop today where "building" is no longer the bottleneck, which means "definition" and "feedback" are where teams win or lose.
• Why workflows are deterministic, predictable, and cheaper than agents, and why the natural progression is skills first, then workflows, and only then agents.
• Why data engineering is the bedrock of AI engineering.
• Why velocity is the only durable moat in a world where everyone has access to the same frontier models.
Thanks to podcast superfan Jonathan Bown for recommending Jeremy and Jacob as guests!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tokenmaxxing vs AI Hardware Bottlenecks
Humans (like Reinforcement Learning algos) can "reward hack": "Tokenmaxxing" being a perfect example, after employers started using "number of tokens" consumed as a proxy for developers' productivity.
Even if humans weren't engaging in this pointless time-, money- and energy-consuming behavior, however, demand for A.I. compute is so vast that everyone's scrambling to to make more available. Alas, four tricky hardware bottlenecks face us:
1. GPUs:
• NVIDIA data-center GPU lead times now run 36–52 weeks, with Blackwell chips sold out through mid-2026.
• The real choke point isn't fabrication: It's TSMC's "CoWoS" advanced packaging, which is sold out through 2026. Nvidia alone has locked up ~60% of CoWoS capacity through 2027.
2. High-Bandwidth Memory (HBM):
• Demand has quintupled since 2023, and only three companies (SK hynix, Samsung and Micron) make it.
• All three are sold out well into 2026 and new HBM factories take 18–24 months to come online.
3. CPUs:
• As workloads shift toward agentic AI, the CPU:GPU ratio jumps from ~1:12 (for GenAI-only chatbots) to 1:1.
• Intel's CFO says the server-CPU shortfall "starts with a B" — billions in unmet demand so server CPU prices are up 10–20% in just the past couple of months.
4. Electricity: Hyperscaler build-outs are now gated by grid interconnect (18–36 months) and transformer lead times.
THE BIG MISMATCH
• The top 5 hyperscalers alone (Alphabet, Amazon, Meta, Microsoft and Oracle) are on track for ~$725B in combined 2026 capex.
• That's roughly 6x the hyperscalers' 2022 spend, with ~75% going to A.I. infrastructure.
• Hardware suppliers, however, have grown capex by only ~50%.... a 6x increase in demand met by only a 50% increase in supply is a big mismatch!
REASONS FOR OPTIMISM
Demand will continue to be high but I'm optimistic we'll continue to squeeze more juice from every lemon because, e.g.:
• Algorithmic efficiency keeps improving — Google's TurboQuant recently briefly tanked memory stocks by promising to materially cut inference memory needs.
• LLM efficiency gains via mixture-of-experts and smarter inference scheduling continue to compound.
• The tokenmaxxing trend is a corporate farce that will fade.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Four Types of Memory Every AI Agent Needs, with Richmond Alake
To build an effective A.I. agent, getting its memory right is essential. In today's episode, our agent-memory guide is brilliant (and very funny!) machine-learning architect and engineer, Richmond Alake.
More on Richmond:
• Director of A.I. developer experience at Oracle.
• Previously roles include: staff developer advocate for AI/ML at MongoDB, ML architect at Slalom, writer for NVIDIA and computer-vision engineer at Loveshark.
• Holds a master's in ML and robotics from the University of Surrey.
In this episode, Richmond magnificently covers:
• How agent memory is the encapsulation of systems (embedding models, rerankers, databases, and LLMs) that allow AI agents to learn and adapt with new information over time, rather than starting from scratch every session.
• The four types of agent memory (all drawn from human cognition).
• Memory-first agent harnesses.
• Predictions for a flattening of AI engineering roles, where the future developer will need end-to-end understanding of the full agent stack.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.