Imagine being able to vibe-code full-blown video games... for free! My returning guest, Dr. Andrey Kurenkov, helped engineer Astrocade to do just that... and already 20 million people have played games through their platform.
More on Andrey:
• Founding A.I. Lead at Astrocade, a Bay Area-based startup that has raised $68m in venture capital to create the TikTok of video games, where creators create games for free and you play them for free.
• Co-host (alongside Jeremie Harris) of my favorite podcast, "Last Week in A.I.".
• Holds a PhD from Stanford University, where his research focused on machine vision and robotics.
In this episode, we discuss:
• The fascinating Astrocade journey, of course.
• The surprising pace of humanoid robotics.
• Why he's a skeptic on Artificial Super Intelligence.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Category: Data Science
TrueFoundry’s Nikunj Bajaj on How to Get $100M Returns on AI Agent Deployments
Imagine being able to deploy an AI agent and getting a return of over $100m from that single deployment. My guest today, Nikunj Bajaj, has facilitated that multiple times! Lots to learn from him, enjoy!
Nikunj:
• CEO and co-founder of TrueFoundry, a Bay Area-based startup that has raised over $20m to solve the thorniest problems that enterprises face when deploying agents.
• His clients include demanding organizations like NVIDIA and Siemens.
• Was previously ML tech lead at Facebook.
• Holds a master's in computer science from University of California, Berkeley.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
End-to-End Foundation Models for the Energy Industry, with Jazmia Henry
What does it take to build foundation LLMs from scratch today? Deeply impressive Jazmia Henry breaks down the four stages in today's episode, enjoy!
Jazmia:
• Holds degrees from Tulane University and Columbia University... and is partway through a PhD at the University of Oxford.
• Held a technical fellowship at Stanford University.
• Previously worked as a data strategist at Morgan Stanley, head of ML at The Motley Fool and a Lead Applied AI engineer at Microsoft.
• Published a top paper at NeurIPS, the world's most prestigious academic AI conference.
• Currently works as "Member of Technical Staff for AI/ML" at collide., a Texas-based startup that’s building AI infrastructure (including all aspects of specialized foundation models) for the energy industry.
Key topics covered in this episode include:
• What foundation models are.
• Her "full-stack" foundation-model building's four distinct stages.
• How reinforcement learning (RL) models are "bursty" because they idle the GPU during reward calculation and then dump enormous loads on it all at once.
• Reward hacking by RL models.
Thanks to Mark Freeman II for recommending Jazmia as a guest.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI’s Putting Recent Grads Out of Work; Here’s How to Get Hired Anyway!
Computer science/engineering grads had an employment advantage (see chart) that, since ChatGPT's release, has disappeared. Is A.I. to blame? Here's what the data say and what new grads (or anyone!) can do about it:
THE EMPLOYMENT LANDSCAPE
• NY Fed: unemployment for recent computer-science grads (22-27) sits at 7.0%, and computer engineering at 7.8% (roughly on par with fine arts and anthropology grads!)
• Compare that to ~5.8% for recent grads overall and ~4% for the whole US workforce.
• Eighteen-year-olds are voting with their feet: US undergrad CS enrolment fell 11% in 2025; computer programming fell a stunning 26%.
• Demand is shrinking too: Handshake postings are down ~50% from their 2022 peak, and Revelio Labs data suggest entry-level software and data-analysis postings have dropped as much as 67%.
IS A.I. TO BLAME?
• "Yes" camp: A 2025 Stanford University study found employment for 22-25-year-olds in A.I.-exposed jobs dropped 13% since 2022, while older workers held steady. The Dallas Fed replicated it... and the decline comes from juniors never being hired, not layoffs.
• "Not so fast" camp: Google economists found posting declines were just as steep for senior workers and predate ChatGPT. A Fed study of 1M+ firms found "null effects." Their take: high interest rates and a post-pandemic hangover, with A.I. as a convenient scapegoat.
WHAT YOU CAN DO:
1. Stop competing on raw code. The human edge is now system design, architecture and deciding what to build in the first place.
2. Pick a domain. "A.I. engineer" is a common résumé; "A.I. engineer who worked alongside a hospital team for two summer internships" is a short list.
3. Build a public portfolio. Substantive GitHub repos and a Kaggle project beat CVs sent into the void.
4. Get fluent with agentic tooling, e.g., RAG, model evaluation, multi-agent orchestration. PwC found A.I.-skilled workers earn a 56% wage premium (!!!)
5. Lean on your network. Referrals and warm intros are crushing mass (often GenAI-produced) applications in this market.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Build AI-First Organizations, with Jacob Miller and Jeremy Mumford
After today's fun episode with Jacob and Jeremy — authors of the brand-new book "Architected Intelligence" — you’ll have all the key info to build successful AI features, AI products and AI-first companies. Enjoy!
Jeremy Mumford and Jacob Miller serve as Lead AI Engineer and Vice President of Platform Intelligence, respectively, at Pattern, a giant Utah-based tech company that IPO’ed on the Nasdaq exchange about six months ago.
Jacob and Jeremy's brand-new "Architected Intelligence" book was published by Wiley and this episode focuses almost exclusively on this invaluable book.
Episode highlights include:
• The "User Agnosticism Tenet", which means designing products and processes so they can be executed equally well by a human, an AI agent, or any hybrid combo.
• The shift in the "define-build-feedback" loop today where "building" is no longer the bottleneck, which means "definition" and "feedback" are where teams win or lose.
• Why workflows are deterministic, predictable, and cheaper than agents, and why the natural progression is skills first, then workflows, and only then agents.
• Why data engineering is the bedrock of AI engineering.
• Why velocity is the only durable moat in a world where everyone has access to the same frontier models.
Thanks to podcast superfan Jonathan Bown for recommending Jeremy and Jacob as guests!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tokenmaxxing vs AI Hardware Bottlenecks
Humans (like Reinforcement Learning algos) can "reward hack": "Tokenmaxxing" being a perfect example, after employers started using "number of tokens" consumed as a proxy for developers' productivity.
Even if humans weren't engaging in this pointless time-, money- and energy-consuming behavior, however, demand for A.I. compute is so vast that everyone's scrambling to to make more available. Alas, four tricky hardware bottlenecks face us:
1. GPUs:
• NVIDIA data-center GPU lead times now run 36–52 weeks, with Blackwell chips sold out through mid-2026.
• The real choke point isn't fabrication: It's TSMC's "CoWoS" advanced packaging, which is sold out through 2026. Nvidia alone has locked up ~60% of CoWoS capacity through 2027.
2. High-Bandwidth Memory (HBM):
• Demand has quintupled since 2023, and only three companies (SK hynix, Samsung and Micron) make it.
• All three are sold out well into 2026 and new HBM factories take 18–24 months to come online.
3. CPUs:
• As workloads shift toward agentic AI, the CPU:GPU ratio jumps from ~1:12 (for GenAI-only chatbots) to 1:1.
• Intel's CFO says the server-CPU shortfall "starts with a B" — billions in unmet demand so server CPU prices are up 10–20% in just the past couple of months.
4. Electricity: Hyperscaler build-outs are now gated by grid interconnect (18–36 months) and transformer lead times.
THE BIG MISMATCH
• The top 5 hyperscalers alone (Alphabet, Amazon, Meta, Microsoft and Oracle) are on track for ~$725B in combined 2026 capex.
• That's roughly 6x the hyperscalers' 2022 spend, with ~75% going to A.I. infrastructure.
• Hardware suppliers, however, have grown capex by only ~50%.... a 6x increase in demand met by only a 50% increase in supply is a big mismatch!
REASONS FOR OPTIMISM
Demand will continue to be high but I'm optimistic we'll continue to squeeze more juice from every lemon because, e.g.:
• Algorithmic efficiency keeps improving — Google's TurboQuant recently briefly tanked memory stocks by promising to materially cut inference memory needs.
• LLM efficiency gains via mixture-of-experts and smarter inference scheduling continue to compound.
• The tokenmaxxing trend is a corporate farce that will fade.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz
Exceptional technical episode today with Dr. Trevor Manz on "marimo Pair", an actually!) game-changing pair-programming A.I.-agent companion that lifts heavy loads within your Python data-science notebook.
More on Trevor:
• 27-time NCAA Swimming All-American & National Champion.
• Master's in Computational Biology from University of Cambridge.
• PhD in Bioinformatics from Harvard University.
• Creator of the popular open-source "anywidget" project (amongst many others, particularly in visualizing bioinformatics data, e.g., genomics data).
• Now a founding engineer at marimo.io, where he is leading the charge on marimo Pair.
Seriously, marimo Pair is unreal. A complete reimagining of what's possible in a Jupyter notebook-style environment in the agentic A.I. era. You will hear (and see) my mind explode in this episode!
We also discuss:
• Agent skills.
• Recursive language models.
• A number of other open-source projects, largely in data viz/analysis.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Web Summit Vancouver 2026
"Collision" has grown and re-branded as "Web Summit Vancouver". I'm looking forward to experiencing the new brand for the first time next week! See you there? Here's where you can catch me:
• Tue May 12 at 11am: Mentor Hours on "scaling your startup"
• Wed May 13 at 1:30pm: Delivering my agentic A.I. talk ("Something Big is Happening") on the "A.I. Summit" stage.
• Wed May 13 at 1:50pm: Emceeing the "A.I. Summit" stage all afternoon.
More on Web Summit Vancouver:
• Taking place May 11–14 at the Vancouver Convention Centre.
• It's the second year in a row the conference, under this new brand, has taken place (the previous "Collision"-branded event was held annually in Toronto and the photo in this post is from a talk I gave there in 2024).
• Connects over 35,000 startup founders, investors and industry leaders to discuss A.I., entrepreneurship and tech trends.
Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib
Mythos finds security vulnerabilities at ~100X the rate of publicly available models, and comparable open-weight models are ~6 months away. Scary? Thankfully my guests today, Anneka and Cal, have solutions!
Anneka:
• Chief Product Officer at Rubrik.
• Lecturer in Product Management at Stanford University.
• Climbed the ladder from software engineer to President (!!) during an 11-year tenure at LiveRamp.
• Holds a degree in math and computational sciences from Stanford.
Cal:
• Principal Technologist at Rubrik.
• Formerly founder and CEO of Pandata, which was acquired by Further.
• Highly sought-after keynote speaker.
• Holds a degree in data science from Case Western Reserve University.
This is an exceptional episode with two brilliant, entertaining and highly knowledgeable guests. It can be enjoyed by anyone! In it, they cover:
• How Anthropic's Mythos model can be pointed at a code repository and autonomously surface every vulnerability inside it, and how Anthropic itself estimates Mythos-class capabilities will reach other labs within six to eighteen months, with open-weight versions likely to follow.
• How code-gen models make it easy for attackers by scaling up their capabilities... and by vibe-coders not being aware of vulnerabilities they have!
• How Rubrik's Agent Cloud delivers three pillars of resilience: visibility into every agent in your environment, governance and runtime control through the SAGE small language model, and remediation through Agent Rewind.
• Why the next wave of knowledge work is inherently cross-functional, with A.I. attorneys, security pros, and data scientists all needing shared literacy in A.I. risk.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in April 2026
Whoa, it's May Day... and our podcast-production team was *on the ball* with getting our ICYMI-in-April episode together lickety-split. In case you missed it, these were the best bits of my on-air convos last month:
1. Oracle's Director of A.I. Developer Experience Richmond Alake defines the four types of memory A.I. agents can have... and the biological inspiration for each of them.
2. Matthew J. Glickman, co-founder/CEO of Genesis Computing, describes how A.I. agents allow data engineers to dramatically scale up their impact in an enterprise.
3. The A.I. infrastructure engineer Linda Haviv has amassed a following of over 250,000 folks on social media. In her clip from last month, she combines both worlds — detailing why A.I. infrastructure has now become everyone's problem while also discussing her work in lowering the barrier to access A.I. education.
4. Traci Walker Griffith, principal of The Eliot School in Boston, shares her novel perspective on what critical thinking is... in the context of how fifth-graders are leveraging A.I. to evaluate their work and prepare for tests.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv
For folks in A.I., software, data science, things are moving so fast, it's easy to be overwhelmed. Luckily, A.I. engineer Linda Haviv makes it a joy to stay up to date! Today, we discuss career tips as well as open-source A.I. tech like Ray.
More on Linda:
• Until recently, was Staff Developer Advocate at Anyscale, makers of Ray, an open-source framework for managing, executing and optimizing A.I. compute.
• Previously was A.I. Developer Advocate at Amazon Web Services (AWS).
• Before that, was a software developer at Fox Corporation.
• Was a professional singer in New York up until her second (of three!) children was born.
• Holds a degree in philosophy from Baruch College.
In this episode, Linda ebulliently covers:
• How "A.I. infrastructure" refers to the compute stack, tooling and frameworks purpose-built for A.I. and ML workloads.
• Ray is a Python-native open-source distributed computing framework that lets engineers distribute training, data processing and model serving across GPUs without needing to become distributed systems experts.
• How building in public, creating content and contributing to open source are not just career insurance... they're how you find your community, attract unexpected opportunities and learn faster through teaching.
• And much more!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Building Hardware is Hard but AI Agents Help, with Kishore Subramanian
In software, when something goes wrong, you push a patch. In hardware? Oooph. You're dealing with big headaches and huge costs. Thankfully, my guest today — Kishore Subramanian — is using AI to transform the way physical products get built for the better.
Kishore:
• Is CTO of Propel Software, a Bay Area company that combines product data with agentic AI to make the production of physical hardware (including high tech and medtech devices) as seamless as possible.
• Prior to Propel, held senior engineering roles at Google, where he worked on Google Assistant, so he has particularly rich experience with agent development.
• Holds a degree in electronics, computers and process control… as well as a 200-hour yoga-teaching certificate!
In this episode, Kishore covers:
• How product lifecycle management (PLM) is the system that takes a physical product from concept all the way to the customer and beyond.
• How AI agents can review engineering change orders — the hardware equivalent of pull requests — to flag risks, compliance gaps, and downstream impacts before they become expensive problems.
• How Propel built their AI platform, Propel One, on top of Salesforce's Agentforce 360 Platform, which gave them security, governance, data infrastructure, and a reasoning engine out of the box, allowing them to ship in about six months.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Four Types of Memory Every AI Agent Needs, with Richmond Alake
To build an effective A.I. agent, getting its memory right is essential. In today's episode, our agent-memory guide is brilliant (and very funny!) machine-learning architect and engineer, Richmond Alake.
More on Richmond:
• Director of A.I. developer experience at Oracle.
• Previously roles include: staff developer advocate for AI/ML at MongoDB, ML architect at Slalom, writer for NVIDIA and computer-vision engineer at Loveshark.
• Holds a master's in ML and robotics from the University of Surrey.
In this episode, Richmond magnificently covers:
• How agent memory is the encapsulation of systems (embedding models, rerankers, databases, and LLMs) that allow AI agents to learn and adapt with new information over time, rather than starting from scratch every session.
• The four types of agent memory (all drawn from human cognition).
• Memory-first agent harnesses.
• Predictions for a flattening of AI engineering roles, where the future developer will need end-to-end understanding of the full agent stack.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra
The headlines shout “SaaSpocalypse,” but I don’t buy it. Neither does my guest today, Raju Malhotra, who argues that, thanks to humans collaborating with agents on optimized workflows, the SaaS opportunity is now far bigger than ever before.
More on Raju:
Chief Product & Technology Officer (CPTO) at Certinia, an Austin, Texas-based company whose Professional Services Automation software is used by over 1400 organizations around the world.
Was previously CPTO at PAR Technology and Khoros.
Earlier, spent 12 years at Microsoft working on cornerstone products like Visual Studio .NET.
Holds an MBA from The Wharton School and an undergrad in computer engineering.
In this episode, we cover:
Traditional SaaS isn't dead… instead, it's evolving into a hybrid of SaaS plus agentic capabilities, where humans and agents work together in optimized workflows.
By removing the human-skills constraint from professional services delivery, the agentic revolution could expand the addressable market by 7-8X.
The Agentforce 360 platform (by combining probabilistic AI with deterministic logic and guardrails) empowers innovators to turn their ideas into scalable software businesses, allowing businesses like Certinia to bring AI agents securely and reliably to their customers, even in sensitive industries where 0.1% error rates are unacceptable.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith
Long overdue episode today on how A.I. can support children's education. Hard to imagine a better guest than Traci Walker Griffith, principal of a K-8 school that has used innovations like A.I. to become Boston's #1 school.
In this episode, we discuss:
How Traci transformed The Eliot School from an underperforming school on the closure list into the highest-performing school in Boston.
How kids as young as four at the Elliott work with robots and coding tools like Kibo and Scratch Junior, learning that the quality of their input determines the quality of their output ("garbage in, garbage out").
How, for younger students in kindergarten through fourth grade, teachers use A.I. behind the scenes.
How students in grades five through eight interact with A.I. directly, enabling them to build metacognition and critical-thinking skills.
Her concrete guidance for schools (or parents!) considering incorporating A.I. into pedagogy.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in March 2026
It just keeps getting better and better... ICYMI, my on-air conversations with guests in March were extraordinary. Today's episode highlights the best bits from last month, specifically:
Zack Kass (who was head of go-to-market at OpenAI when ChatGPT was launched and who recently wrote bestselling book "The Next RenAIssance") details why classrooms must change in the age of A.I.
Renowned New York University professor KyungHyun Cho explains why A.I. learning to explore the world like humans will unlock major progress in A.I. capability.
Three-time bestselling O'Reilly author Chris Fregly tells us why, if we're still writing code manually in 2026, we're behind the times.
Fireworks AI CEO Lin Qiao explains the difference between artificial general intelligence (AGI) and what she terms "autonomous intelligence".
Acceldata CEO Rohit Choudhary provides a clear vision for how job roles will be transformed by A.I.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman
Something big happened in February that changed the world forever. My guest today, Matthew J. Glickman, says code-generating models crossed an event horizon... and there's no turning back. Listen in for the implications.
More on Matt:
Co-founder and CEO of Genesis Computing, a New York-based company building enterprise-ready data agents that automate everything from raw data to production applications, compressing projects that took months into hours while recovering massive hiring costs.
Previously spent over two decades at Goldman Sachs leading analytics and data platform teams, then joined Snowflake as employee 81, where he led Product Management, launched the Snowflake Marketplace, and grew Financial Services into Snowflake’s largest industry vertical.
Holds a degree in Computer Science and Math.
In this episode, which will be fascinating to anyone but especially to hands-on A.I. and data practitioners, we discuss:
How February 2026 marked the moment the latest frontier models crossed a threshold where they could handle complex, multi-step data engineering workflows that previously required human expertise... and there's no going back.
How finance and healthcare were late to adopt the cloud but are among the earliest and most aggressive adopters of A.I.
How Genesis deploys its agentic platform directly inside a client's environment (more like onboarding a new employee than adopting a SaaS product) so that all accumulated knowledge remains the company's asset.
How, rather than acting as a copilot that waits for human instructions step by step, Genesis inverts the model: Agents work autonomously on complex data engineering tasks and only escalate to humans when their confidence is low, memorializing every answer so they never ask the same question twice.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI Making Theoretical Physics Breakthroughs
A.I. is now directly advancing science. "SuperChat", a powerful internal OpenAI model, recently helped crack a particle physics problem that had stumped researchers for over a year. Here's what happened:
THE PROBLEM
Four theoretical physicists (from Harvard, the Institute for Advanced Study, Cambridge and Vanderbilt) had been studying interactions involving gluons — the particles that "glue" quarks together inside protons and neutrons, essentially holding all matter together.
For decades, textbooks said a specific type of gluon interaction (called "single-minus" configurations) had a "scattering amplitude" of zero (i.e., these interactions simply could not occur).
The team suspected otherwise, and proved it for small numbers of gluons... but as they tried to generalize the formula, the expressions became dozens of terms long and unworkable. After about a year of grinding away by hand, they were stuck.
THE BREAKTHROUGH
They fed their complicated formulae into GPT-5.2 Pro. The model simplified an expression with 32 variables down to a compact product fitting on a single line.
Asked to generalize for any number of gluons, the model replied within minutes with what it called (I love this!) the "obvious" generalization.
A more powerful internal OpenAI model (which the researchers called "SuperChat") then produced a formal proof after about 12 hours of autonomous reasoning. The physicists checked step by step and confirmed it was correct.
The team then extended the approach to gravitons (hypothetical particles thought to carry the gravitational force), releasing the results in their second arXiv preprint a few weeks later.
CAVEATS
These are preprints, not yet peer-reviewed papers.
The results apply to a very specific mathematical regime at the simplest level of calculation ("tree level").
Human physicists were essential for defining the problem, providing the initial data and verifying the output.
WHY IT MATTERS
As one researcher put it: The hard part is no longer the physics itself; the hard part is now verifying the results and writing them up. AI compressed months of work into weeks.
This may be a template for AI-assisted research more broadly: AI generates conjectures from patterns in the data, human experts verify those conjectures through rigorous math and physical consistency checks.
It's not autonomous AI science; it's augmented human science. And that model could scale across disciplines, from pure math to drug discovery to materials science
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary
Because of the vast tokens generated by agentic A.I. workflows, my guest today Rohit Choudhary sees enterprise data soon increasing at nearly 10x per YEAR. He's zen though... because he's built the platform to handle it...
More on Rohit:
Founder and CEO of Acceldata, a Bay Area software company that has raised nearly $100m in venture capital to advance data observability and Agentic Data Management for the A.I. era.
Previously, was Director of Engineering at Hortonworks, where he led large-scale distributed systems initiatives across open-source data platforms.
Some of the great topics covered in this episode:
How Rohit coined the term "data observability" in 2018.
Fixing bad data at the point of consumption can be roughly a thousand times more expensive than catching and fixing it as it flows through the pipeline.
For your enterprise data to be AI ready, they need to satisfy multiple dimensions, incl. technical accuracy and business-context compliance.
Enterprise data grow 4-5x year-over-year now, accelerating to nearly 10x soon, driven largely by the explosion of A.I. agents generating queries and activity at a scale that dwarfs human users.
The most valuable developers won't necessarily be the best programmers — they'll be the ones with the clearest thinking, the deepest domain expertise, and the curiosity to articulate precisely what outcomes they need.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
A game millions of people solve over morning coffee is exposing a fundamental weakness in the Transformer-based LLMs that dominate A.I. today. Here's why Sudoku matters for the future of A.I.:
THE BENCHMARK
Pathway tested its post-transformer architecture, BDH (Baby Dragon Hatchling 🐲) against "Sudoku Extreme," a collection of ~250,000 of the hardest Sudoku puzzles available.
Leading LLMs (such as o3-mini, DeepSeek-R1, Claude 3.7 Sonnet) scored effectively zero percent.
BDH, in stark contrast, solved them at 97.4% accuracy. That's not a marginal gap... it's a categorical one.
WHY SUDOKU IS A GREAT A.I. TEST
Sudoku is a constraint-satisfaction problem: Every move must satisfy multiple rules simultaneously across rows, columns and boxes. It demands search, tracking and backtracking — well beyond pattern-matching.
This makes it a clean proxy for real-world reasoning in medicine, law, operations, planning and tons of other fields, where you balance competing constraints under uncertainty.
WHY TRANSFORMERS STRUGGLE
LLMs turn every problem into text and solve it by predicting the next token. That works brilliantly for language tasks... but Sudoku doesn't live in language.
A transformer's internal state is constrained to ~1,000 floating-point values per token, and each decision gets locked in as text is generated. It can't hold multiple candidate strategies in parallel or backtrack without verbalizing every step.
WHAT BDH DOES DIFFERENTLY
BDH maintains a much larger internal "latent reasoning space" that isn't forced into text (think of a chess grandmaster playing 20 blindfold games without whispering moves to herself).
It uses sparse positive activations (~5% of neurons firing at any time), far more biologically plausible than the dense activation in transformers.
It's a state-based model (no standard attention mechanism), continuously updating internal state that's inspired by biological neuroscience (called Hebbian learning: "neurons that fire together wire together").
It achieves continual learning: BDH can pick up a new game's rules and reach advanced-beginner level in ~20 minutes, then improve through play... at roughly 10x lower cost than the Transformer-based LLMs achieve their near-zero scores.
CAVEAT
BDH is still early: It has been demonstrated at a ~1 billion parameter scale (comparable to GPT-2), not yet at frontier scale.
BOTTOM LINE
...but the data are clear: 0% vs. 97.4% is not incremental. It suggests the transformer's reasoning ceiling is real and alternative architectures can address it. Exciting to see alternatives to the dominant but limiting Transformer architecture emerge!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.