The release of Claude Opus 4.5 this week didn't knock Gemini 3 Pro off the top of the LMArena leaderboard... meaning today's episode of my podcast (I recorded it a week ago) about Google retaking the lead on AI is still relevant, woohoo! Here are the details...
Read MoreFiltering by Tag: llm
Introducing the First Book in My A.I. Signature Series
I'm delighted to announce that the first book in my "Pearson A.I. Signature Series" is "Building Agentic AI" by the prolific author Sinan Ozdemir... and it will be published on Sunday!
It's available for pre-order now worldwide from wherever you buy your books! You can also read a digital version in the O'Reilly platform today if you have access to it.
The book is packed with hands-on examples in Python and it allows you to master the complete agentic A.I. pipeline, including practical guidance and code on how to:
Design adaptive A.I. agents with memory, tool use, and collaborative reasoning capabilities.
Build robust RAG workflows using embeddings, vector databases and LangGraph state management.
Implement comprehensive evaluation frameworks beyond just "accuracy"
Deploy multimodal A.I. systems that seamlessly integrate text, vision, audio and code generation.
Optimize models for production through fine-tuning, quantization and speculative decoding techniques.
Navigate the bleeding edge of reasoning LLMs and computer-use capabilities.
Balance cost, speed, accuracy and privacy in real-world deployment scenarios.
Create hybrid architectures that combine multiple agents for complex enterprise applications.
Thanks to Debra Williams Cauley, Dayna Isley and many more at Pearson for bringing this series to life. The second book in the series will be available in December and I'll announce that shortly!
Dragon Hatchling: The Missing Link Between Transformers and the Brain, with Adrian Kosowski
What do dragons, macarons and potato latkes have in common? They've sparked a revolutionary model that's poised to replace the Transformer. Today, Adrian Kosowski reveals this big breakthrough.
Adrian:
• Chief Scientific Officer and co-founder of Pathway.
• Theoretical computer scientist, quantum physicist and mathematician.
• Earned his PhD at 20 years old and went on to serve as a tenured researcher at Inria at 23 and associate professor at École Polytechnique.
• Has authored more than 100 scientific papers spanning graph algorithms, distributed systems, quantum information and A.I.
In today's highly technical episode, Adrian demonstrates how the Pathway team have brought devised the Baby Dragon Hatching (BDH) architecture, thereby allowing attention in LLMs to function more like the way the biological brain functions. This is revolutionary because (relative to the today-ubiquitous Transformer architecture) BDH allows:
• Reasoning to be generalized across more complex and extended reasoning patterns, approximating a more human-like approach to problem-solving.
• Saving time/compute/money through sparse activation at inference time.
• Allowing for more interpretability.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The “Lethal Trifecta”: Can AI Agents Ever Be Safe?
The "Lethal Trifecta": There are three factors in agentic A.I. systems that may mean they will never be safe in production. I summarize them and provide potential solutions below.
Read More8 Steps to Becoming an AI Engineer, with Kirill Eremenko
The #1 fastest-growing role in many countries (including the US) is A.I. Engineer. Want to become one? ...or add the skillset into your existing role? Today, Kirill Eremenko provides an 8-step roadmap!
Many of you will already know Kirill:
Founder and CEO of SuperDataScience.com, the eponymous e-learning platform.
Founded the SuperDataScience Podcast nine years ago and hosted the show until he passed me the reins five years ago.
With over 3 million students, he’s the most popular data science and A.I. instructor on Udemy.
He holds a Master’s from The University of Queensland in Australia and a Bachelor’s in Applied Physics and Mathematics from the Moscow Institute of Physics and Technology.
Today's episode will be of primary interest to hands-on practitioners like data scientists and software developers.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi
Today, extraordinary Michelle Yi details LLM jailbreaking (as well as data poisoning, prompt stealing and slop squatting!) and how to prevent it. Scary content but she makes it funny and entertaining, enjoy!
When I say "extraordinary", I'm not exaggerating. Michelle:
Finished her undergrad at the same age as most folks finish high school.
While working full-time as an engineering lead at IBM on Jeopardy-playing Watson, she was also a professional violinist in the New York Philharmonic!
In the past decade, has held a impressive list of AI leadership roles at Bay Area startups.
Now is helping (startlingly underrepresented) women in tech startups and venture capital through co-founding Generationship, being a venture partner in (ironically named) The Tech Bros and a board member for Women In Data™️.
Today's episode skews a bit toward hands-on practitioners but Michelle does such a wonderful job of communicating complex concepts and making them relevant to modern global events that anyone might love this episode.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLM Pre-Training and Post-Training 101, with Julien Launay
How are cutting-edge LLMs are trained? Find out in today's exceptional episode with Julien Launay, who digs into pre-training (supervised learning) and post-training (reinforcement learning) in eloquent detail.
Julien:
• CEO and co-founder of Adaptive ML, a remarkably fast-growing startup focused on enabling A.I. models to learn from experience.
• Previous led the extreme-scale research teams at Hugging Face and LightOn, where he helped develop state-of-the-art open-source models.
• Organizer of the "Efficient Systems for Foundation Models" workshop at ICML (the prestigious International Conference on Machine Learning).
Today's episode will appeal most to hands-on practitioners but other folks who are open to getting into the technical weeds on Large Language Model (LLM) training should also listen in.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Neuroscience, AI and the Limitations of LLMs, with Dr. Zohar Bronfman
I was blown away by today's guest, the brilliant dual-PhD Zohar Bronfman as we discussed neuroscience, A.I., and why predictive models offer a better ROI than generative ones. Enjoy!
Dr. Bronfman:
• Is the co-founder and CEO of Pecan AI, a predictive analytics platform that has raised over $100m in venture capital.
• Holds two PhDs — one in computational neuroscience and another in philosophy — bringing a deep, multidisciplinary lens to the design and impact of A.I. systems.
• Focuses on the evolution of machine learning from statistical models to agentic systems that influence real-world outcomes.
Today’s episode will be fascinating for every listener.
In it, Zohar details:
• The trippy implications of the reality that your brain makes decisions hundreds of milliseconds before you're consciously aware of them.
• The intelligence feat that bumblebees can do that current A.I. cannot, with implications for the realization of human-like intelligence in machines.
• Why predictive models are more important than generative models for businesses but how generative LLMs can nevertheless make building and deploying predictive models much easier and accessible.
• The rollercoaster journey that led him to create a sensationally successful A.I. startup immediately upon finishing his academic degrees.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Why RAG Makes LLMs Less Safe (And How to Fix It), with Bloomberg’s Dr. Sebastian Gehrmann
In today's episode, A.I. researcher Dr. Sebastian Gehrmann details what RAG is and why it makes LLMs *less* safe... despite popular perception of the opposite.
Sebastian:
Is Head of Responsible A.I. at Bloomberg, the New York-based financial, software, data, and media company that (with 20,000 employees) is huge.
Previously, as Head of NLP at Bloomberg, he directed the development and adoption of language technology to bring the best A.I.-enhanced products to the Bloomberg Terminal.
Prior to Bloomberg, was a senior researcher at Google, where he worked on the development of large language models, including the groundbreaking BLOOM and PaLM models.
He holds a Ph.D. in computer science from Harvard University.
Today’s episode skews slightly toward our more technical listeners like data scientists, A.I. engineers and software developers, but anyone who’d like to be up to date on the latest A.I. research may want to give it a listen.
In today’s episode, Sebastian details:
The shocking discovery that retrieval augmented generation (RAG) actually makes LLMs LESS safe, despite the popular perception of the opposite.
Why the difference between 'helpful' and 'harmless' A.I. matters more than you may think.
The hidden “attack surfaces” that emerge when you combine RAG with enterprise data.
The problems that can happen when you push LLMs beyond their intended context window limits.
What you can do to ensure your LLMs are Helpful, Honest and Harmless for your particular use cases.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir
Sensational episode for you today with the illustrious A.I. author, educator and entrepreneur Sinan Ozdemir on how LLM benchmarks are lying to you... and what you can do about it.
Sinan:
Is Founder and CTO of LoopGenius, a generative A.I. startup.
Authored several excellent books, including, most recently, the bestselling "Quick Start Guide to Large Language Models".
Hosts the "Practically Intelligent" podcast.
Was previously adjunct faculty at The Johns Hopkins University, now teaches several times a month within the O'Reilly platform.
Serial A.I. entrepreneur, including founding a Y Combinator-backed generative A.I. startup way back in 2015 that was later acquired.
Holds a Master’s in Pure Math from Johns Hopkins.
Today’s episode skews slightly toward our more technical listeners but Sinan excels at explaining complex concepts in a clear way so today’s episode may appeal to any listener of this podcast.
In today’s episode, Sinan details:
Why the A.I. benchmarks everyone relies on might be lying to you.
How the leading A.I. labs are gaming the benchmark system.
Tricks to actually effectively evaluate LLMs’ capabilities for your use cases.
What the future of benchmarking will involve, including how to benchmark agentic and multimodal models.
How a simple question about watermelon seeds reveals the 40% failure rate of even today’s most advanced A.I. models.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Blackwell GPUs Are Now Available at Your Desk, with Sama Bali and Logan Lawler
Today's charming and complementary guests — Sama Bali from NVIDIA and Logan Lawler from Dell — make for an extra fun episode on the powerful new Blackwell GPUs... now available at your desk!
More on Sama:
A.I. Solutions leader at NVIDIA that specializes in bringing A.I. products to market.
Prior to NVIDIA, held a Machine Learning Solutions role at Amazon Web Services (AWS).
Focused on educating data scientists and developers on A.I. innovations and implementing them effectively in enterprises.
Holds a Masters in Engineering Management from San José State University.
More on Logan:
Leads Dell Pro Max A.I. Solutions (if you haven’t heard of Pro Max before, we’ll cover that in this episode!)
Over his sixteen-year tenure at Dell Technologies, has held positions across merchandising, services, marketing and e-commerce.
Holds an MBA in management from Texas State University.
Today’s episode will be particularly appealing to hands-on data science, machine learning and A.I. practitioners but it isn’t especially technical and so can be enjoyed by anyone!
In today’s episode, Sama and Logan detail:
Why data scientists are camping out at 6AM to attend NVIDIA's GTC conference.
The killer specs of NVIDIA’s next-generation Blackwell GPUs.
How Dell and Nvidia have joined forces to bring server-level AI power right to your desktop.
How microservices are revolutionizing A.I. development and deployment.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber
The mind-blowing A.I. capabilities of recent years are made possible by vast quantities of specialized A.I.-accelerator chips. Today, AWS's (brilliant, amusing and Zen!) Emily Webber explains how these chips work.
Emily:
• Is a Principal Solutions Architect in the elite Annapurna Labs ML service team that is part of Amazon Web Services (AWS).
• Works directly on the Trainium and Inferentia hardware accelerators (for, respectively, training and making inferences with A.I. models).
• Also works on the NKI (Neuron Kernel Interface) that acts as a bare-metal language and compiler for programming AWS instances that use Trainium and Inferentia chips.
• Wrote a book on pretraining foundation models.
• Spent six years developing distributed systems for customers on Amazon’s cloud-based ML platform SageMaker.
• Leads the Neuron Data Science community and leads the technical aspects for the “Build On Trainium” program — a $110m credit-investment program for academic researchers.
Today’s episode is on the technical side and will appeal to anyone who’s keen to understand the relationship between today’s gigantic A.I. models and the hardware they run on.
In today’s episode, Emily details:
• The little-known story of how Annapurna Labs revolutionized cloud computing.
• What it takes to design hardware that can efficiently train and deploy models with billions of parameters.
• How Tranium2 became the most powerful A.I. chip on AWS.
• Why AWS is investing $110 million worth of compute credits in academic AI research.
• How meditation and Buddhist practice can enhance your focus and problem-solving abilities in tech.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Neural Processing Units Bringing AI to PCs, with Shirish Gupta
In many situations, it's impractical (or even impossible!) to have A.I. executed in the cloud. In today's episode, Shirish Gupta details when to run A.I. locally and how Neural Processing Units (NPUs) make it practical.
Today's episode is about efficiently designing and deploying AI applications that run on the edge. Our guide on that journey is SuperDataScience Podcast fan, Shirish! Here's more on him:
• Has spent more than two decades working for the global technology juggernaut, Dell Technologies, in their Austin, Texas headquarters.
• Has held senior systems engineering, quality engineering and field engineering roles.
• For the past three years, has been Director of AI Product Management for Dell’s PC Group.
• Holds a Master’s in Mechanical Engineering from the University of Maryland.
Today’s episode should appeal to anyone who is involved with or interested in real-world A.I. applications.
In this episode, Shirish details:
• What Neural Processing Units (NPUs) are and why they're transforming A.I. on edge devices.
• Four clear, compelling reasons to consider moving AI workloads from the cloud to your local device.
• The "A.I. PC" revolution that's bringing A.I. acceleration to everyday laptops and workstations.
• What kinds of Large Language Models are best-suited to local inference on AI PCs.
• How Dell's Pro A.I. Studio toolkit will drastically reduce enterprise A.I. deployment time.
• Plenty of real-life A.I. PC examples, including how a healthcare provider achieved physician-level accuracy with a custom vision model.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
NoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake
In today's episode (#871), I'm joined by the gifted writer, speaker and ML developer Richmond Alake, who details what NoSQL databases are and why they're ideally suited for A.I. applications.
Richmond:
Is Staff Developer Advocate for AI and Machine Learning at MongoDB, a huge publicly-listed database company with over 5000 employees and over a billion dollars in annual revenue.
With Andrew Ng, he co-developed the DeepLearning.AI course “Prompt Compression and Query Optimization” that has been undertaken by over 13,000 people since its release last year.
Has delivered his courses on Coursera, DataCamp, and O'Reilly.
Authored 200+ technical articles with over a million total views, including as a writer for NVIDIA.
Previously held roles as an ML Architect, Computer Vision Engineer and Web Developer at a range of London-based companies.
Holds a Master’s in computer vision, machine learning and robotics from The University of Surrey in the UK.
Today's episode (filmed in-person at MongoDB's London HQ!) will appeal most to hands-on practitioners like data scientists, ML engineers and software developers, but Richmond does a stellar job of introducing technical concepts so any interested listener should enjoy the episode.
In today’s episode, Richmond details:
How NoSQL databases like MongoDB differ from relational, SQL-style databases.
Why NoSQL databases like MongoDB are particularly well-suited for developing modern A.I. applications, including Agentic A.I. applications.
How Mongo incorporates a native vector database, making it particularly well-suited to RAG (retrieval-augmented generation).
Why 2025 marks the beginning of the "multi-era" that will transform how we build A.I. systems.
His powerful framework for building winning A.I. strategies in today's hyper-competitive landscape.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI Should Make Humans Wiser (But It Isn’t), with Varun Godbole
Today's trippy, brain-stimulating episode features Varun Godbole, a former Google Gemini LLM researcher who’s turned his attention to the future implications of the crazy-fast-moving exponential moment we're in.
Varun:
Spent the past decade doing Deep Learning research at Google, across pure and applied research projects.
For example, he was co-first author of a Nature paper where a neural network beat expert radiologists at detecting tumors.
Also co-authored the Deep Learning Tuning Playbook (that has nearly 30,000 stars on GitHub!) and, more recently, the LLM Prompt Tuning Playbook.
He's worked on engineering LLMs so that they generate code and most recently spent a few years as a core member of the Gemini team at Google.
Holds a degree in Computer Science as well as in Electrical and Electronic Engineering from The University of Western Australia.
Varun mostly keeps today’s episode high-level so it should appeal to anyone who, like me, is trying to wrap their head around how vastly different society could be in a few years or decades as a result of abundant intelligence.
In today’s episode, Varun details:
How human relationship therapy has helped him master A.I. prompt engineering.
Why focusing on A.I. agents so much today might be the wrong approach — and what we should focus on instead.
How the commoditization of knowledge could make wisdom the key differentiator in tomorrow's economy.
Why the future may belong to "full-stack employees" rather than traditional specialized roles.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
DeepSeek R1: SOTA Reasoning at 1% of the Cost
In recent weeks, I’m sure you’ve noticed that there’s been a ton of excitement over DeepSeek, a Chinese A.I. company that was spun out of a Chinese hedge fund just two years ago.
Read MoreBAML: The Programming Language for AI, with Vaibhav Gupta
Today's guest, Vaibhav Gupta, has developed BAML, the programming language for AI. If you are calling LLMs, you've gotta check BAML out for instant accuracy improvements and big (20-30%) cost savings.
More on charming and terrifically brilliant Vaibhav:
Founder & CEO of Boundary (YC W23), a Y Combinator-backed startup that has developed a new programming language (BAML) that makes working with LLMs easier and more efficient for developers.
Across his decade of experience as a software engineer, he built predictive pipelines and real-time computer vision solutions at Google, Microsoft and the renowned hedge fund The D. E. Shaw Group.
Holds a degree in Computer Science and Electrical Engineering from The University of Texas at Austin.
This is a relatively technical episode. The majority of it will appeal to folks who interact with LLMs or other model APIs hands-on with code.
In today’s information-dense episode, Vaibhav details:
How his company pivoted 13 times before settling upon developing a programming language for A.I.
Why creating a programming language was "really dumb" but why it’s turning out to be brilliant, including by BAML already saving companies 20-30% on their AI costs.
Fascinating parallels between today's A.I. tools and the early days of web development.
His unconventional hiring process (I’ve never heard of anything remotely close to it) and the psychology behind why it works.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI Engineering 101, with Ed Donner
My holiday gift to you is my Nebula.io co-founder Ed Donner, one of the most brilliant, articulate people I know. In today's episode, Ed introduces the exciting, in-demand "A.I. Engineer" career — what's involved and how to become one.
After working daily alongside this world-class mind and exceptional communicator for nearly a decade, it is at long last my great pleasure to have the extraordinary Ed as my podcast guest. Ed:
• Is co-founder and CTO of Nebula, a platform that leverages generative and encoding A.I. models to source, understand, engage and manage talent.
• Previously, was co-founder and CEO of an A.I. startup called untapt that was acquired in 2020.
• Prior to becoming a tech entrepreneur, Ed had a 15-year stint leading technology teams on Wall Street, at the end of which he was a Managing Director at JPMorganChase, leading a team of 300 software engineers.
• He holds a Master’s in Physics from the University of Oxford.
Today’s episode will appeal most to hands-on practitioners, particularly those interested in becoming an A.I. Engineer or leveling up their command of A.I. Engineering skills.
In today’s episode, Ed details:
• What an A.I. Engineer (also known as an LLM Engineer) is.
• How the data indicate A.I. Engineers are in as much demand today as Data Scientists.
• What an A.I. Engineer actually does, day to day.
• How A.I. Engineers decide which LLMs to work with for a given task, including considerations like open- vs closed-source, what model size to select and what leaderboards to follow.
• Tools for efficiently training and deploying LLMs.
• LLM-related techniques including RAG and Agentic A.I.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Andrew Ng on AI Vision, Agents and Business Value
My guest today is the inimitable Andrew Ng! In his trademark, clear-spoken style, Andrew gives us a glimpse of the Agentic A.I. future, particularly how the coming Vision Agent tsunami will change the world.
I suspect pretty much everyone knows Dr. Ng already, but just in case:
As director of Stanford University's AI Lab, his research group played a key role in the development of deep learning (which led to him to founding the influential Google Brain team) as well as educating millions on machine learning (and leading to him co-founding Coursera).
Is Managing General Partner of AI Fund, a world-leading A.I. venture studio.
Was CEO (is now Executive Chairman) of LandingAI, a computer-vision platform that specializes in domain-specific Large Vision Models (analogous to LLMs for language).
Founded DeepLearning.AI, which provides excellent technical training on ML, deep learning (of course!), generative A.I. and many other associated subjects.
Was co-CEO (as well as co-founder and chairman) of Coursera, which brought online learning from 300 leading universities to over 100 million students.
This episode was recorded live at the ScaleUp:AI conference in New York a few weeks ago. Thanks to George Mathew and Jennifer Jordan for inviting me back to the conference to interview Andrew :)
In today’s, Andrew details:
Why a cheaper A.I. model with smart agentic A.I. workflow might outperform more expensive, more advanced models.
The surprising truth about A.I. API costs that most businesses don't realize.• How Marvin Minsky's "Society of Mind" theory from the 1980s is making an unexpected comeback in modern A.I.
A groundbreaking new way to process visual data that goes beyond traditional computer vision.
Why unstructured data will be the key to A.I.'s next big revolution.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in April 2024
Other than excessive maleness and paleness*, April 2024 was an excellent month for the podcast, packed with outstanding guests. ICYMI, today's episode highlights the most fascinating moments of my convos with them.
Specifically, conversation highlights include:
1. Iconic open-source developer Dr. Hadley Wickham putting the "R vs Python" argument to bed.
2. Aleksa Gordić, creator of a digital A.I.-learning community of 160k+ people, on the movement from formal to self-directed education.
3. World-leading futurist Bernard Marr on how we can work with A.I. as opposed to it lording over of us.
4. Educator of millions of data scientists, Kirill Eremenko, on why gradient boosting is so powerful for making informed business decisions.
5. Prof. Barrett Thomas on how drones could transform same-day delivery.
*Remedied in May!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.