The release of Claude Opus 4.5 this week didn't knock Gemini 3 Pro off the top of the LMArena leaderboard... meaning today's episode of my podcast (I recorded it a week ago) about Google retaking the lead on AI is still relevant, woohoo! Here are the details...
Read MoreFiltering by Tag: LLMs
Introducing the First Book in My A.I. Signature Series
I'm delighted to announce that the first book in my "Pearson A.I. Signature Series" is "Building Agentic AI" by the prolific author Sinan Ozdemir... and it will be published on Sunday!
It's available for pre-order now worldwide from wherever you buy your books! You can also read a digital version in the O'Reilly platform today if you have access to it.
The book is packed with hands-on examples in Python and it allows you to master the complete agentic A.I. pipeline, including practical guidance and code on how to:
Design adaptive A.I. agents with memory, tool use, and collaborative reasoning capabilities.
Build robust RAG workflows using embeddings, vector databases and LangGraph state management.
Implement comprehensive evaluation frameworks beyond just "accuracy"
Deploy multimodal A.I. systems that seamlessly integrate text, vision, audio and code generation.
Optimize models for production through fine-tuning, quantization and speculative decoding techniques.
Navigate the bleeding edge of reasoning LLMs and computer-use capabilities.
Balance cost, speed, accuracy and privacy in real-world deployment scenarios.
Create hybrid architectures that combine multiple agents for complex enterprise applications.
Thanks to Debra Williams Cauley, Dayna Isley and many more at Pearson for bringing this series to life. The second book in the series will be available in December and I'll announce that shortly!
Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta
Mixture-of-experts models? State-space models? Easily running these cutting-edge LLM approaches on any device? In today's episode, deep experts Shirish and Tyler masterfully reveal all.
Shirish Gupta:
Returns to my podcast for the third time this year!
Director of A.I. Product Management at Dell Technologies, where he's been for over 20 years!
Holds a Master's in Engineering from the University of Maryland.
Tyler Cox:
Distinguished Engineer at Dell in the Client Solutions Group CTO.
Leads on-device A.I. innovation programs across a wide range of products, including A.I. PCs, workstations, and edge computing platforms.
Holds a Master's in Software Engineering from The University of Texas at Austin.
Today's episode will be particularly appealing to hands-on practitioners (e.g., data scientists, AI/ML engineers, software developers) as well as anyone who could benefit from considering shifting A.I. workloads from the cloud to local devices.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLMs Are Delighted to Help Phishing Scams
Reuters recently tested 6 major LLMs (Grok, ChatGPT, Meta AI, Claude, DeepSeek, Gemini) to assess whether they'd create phishing content... with minor prompt adjustments, 4 out of 6 complied — yikes!
THE INVESTIGATION
Reporters from Reuters requested phishing emails targeting elderly people, fake IRS/bank messages, and tactical scam advice.
THE RESULTS
• Despite initial refusals across the board, relatively simple prompt modifications bypassed safety guardrails.
• Grok, for example, generated a fake charity phishing email targeting the elderly with urgency tactics like "Click now to act before it's too late!"
• When tested on 100 California seniors, the A.I.-generated messages successfully persuaded people to click on malicious links, often because messages seemed urgent or familiar.
REAL-WORLD IMPACT
• The FBI reports phishing is the #1 cybercrime in the U.S., with billions of messages sent daily.
• BMO Bank, as one corporate example, currently blocks 150,000-200,000 phishing emails per month targeting employees... a representative says the problem is escalating: "The numbers never go down, they only go up."
• Cybersecurity experts state criminals are already using A.I. for faster, more sophisticated phishing campaigns.
IMPLICATIONS FOR THOSE OF US IN THE AI INDUSTRY
• LLM misuse is an industry-wide challenge affecting all major frontier labs.
• Reveals fundamental tension between making AI "helpful" vs. "harmless", highlighting the need for more robust safety guardrails across AI systems.
KEY TAKEAWAYS
• For A.I. Builders: Keep security implications front and center when developing applications.
• For users: The same LLMs that helps you write emails can help bad actors craft convincing scams... stay vigilant and educate vulnerable populations (e.g., seniors) about A.I.-enhanced phishing threats. They're only going to get more and more compelling and frequent.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
What is an "AI Agent"? Explained in 90 Seconds
What is an "A.I. Agent" anyway? In today's 90-second video, I quickly and concretely explain.
This video clip was taken from a four-hour "Agentic A.I. Engineering" workshop that I delivered with Ed Donner... indeed, the slides I present in this clip were created by Ed!
Over the coming weeks, I'll release more clips from the full workshop that quickly explain additional discrete topics.
See here for a (today still quite short!) agentic A.I. YouTube playlist that includes the full four-hour workshop if you're interested in checking that out.
Automating Code Review with AI, feat. CodeRabbit’s David Loker
Today, enjoy hearing from the super-intelligent engineer David Loker on how A.I. transforming software development by dramatically accelerating code reviews and automatically improving code bases. It's a great one!
(He also, like me, is a big fan of GPT-5... hear why later in the episode.)
More on David:
• Director of A.I. at CodeRabbit (who've raised $88m in venture capital including a $60m Series B a couple weeks ago, congrats!)
• Previously Lead Data Scientist, ML Engineer and Senior Software Engineer at firms like Netflix and Amazon.
• Holds a Master of Mathematics in Computer Science from the University of Waterloo.
Today's episode will be particularly appealing to software developers and other hands-on practitioners (data scientists, ML engineers, etc.) but David is an outstanding communicator of complex info so any interested listener will enjoy it.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLMs, Agentic AI & Blackmail with Jon Krohn
Name a podcast host more charismatic than Modern CTO's Joel Beasley! I appeared on his show to detail how Agentic A.I. is transforming enterprises and was laughing all the way through.
In a bit more detail, we discussed:
• The difference between LLMs and Agentic A.I.
• How businesses can get an ROI from A.I.
• Why understanding AI misalignment is crucial for future implementations.
• My new consulting firm, Y Carrot, which specializes in creating and deploying bespoke A.I. software for enterprises.
Listen to or watch the full episode here: moderncto.io/jon-krohn-2/
How to Integrate Generative A.I. Into Your Business, with Piotr Grudzień
Want to integrate Conversational A.I. ("chatbots") into your business and ensure it's a (profitable!) success? Then today's episode with Quickchat AI co-founder Piotr Grudzień, covering both customer-facing and internal use cases, will be perfect for you.
Piotr:
• Is Co-Founder and CTO of Quickchat AI, a Y Combinator-backed conversation-design platform that lets you quickly deploy and debug A.I. assistants for your business.
• Previously worked as an applied scientist at Microsoft.
• Holds a Master’s in computer engineering from the University of Cambridge.
Today's episode should be accessible to technical and non-technical folks alike.
In this episode, Piotr details:
• What it takes to make a conversational A.I. system successful, whether that A.I. system is externally facing (such as a customer-support agent) or internally facing (such as a subject-matter expert).
• What’s it’s been like working in the fast-developing Large Language Model space over the past several years.
• What his favorite Generative A.I. (foundation model) vendors are.
• What the future of LLMs and Generative A.I. will entail.
• What it takes to succeed as an A.I. entrepreneur.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Vicuña, Gorilla, Chatbot Arena and Socially Beneficial LLMs, with Prof. Joey Gonzalez
Vicuna, Gorilla and the Chatbot Arena are all critical elements of the new open-source LLM ecosystem — the extremely knowledgeable and innovative Prof. Joseph Gonzalez is behind all of them. Get the details in today's episode
Joey:
• Is an Associate Professor of Electrical Engineering and Computer Science at the University of California, Berkeley.
• Co-directs the Berkeley RISE Lab, which studies Real-time, Intelligent, Secure and Explainable systems.
• Co-founded Turi (acquired by Apple for $200m) and more recently Aqueduct.
• His research is integral to major software systems including Apache Spark, Ray (for scaling Python ML), GraphLab (a high-level interface for distributed ML) and Clipper (low-latency ML serving).
• His papers—published in top ML journals—have been cited over 24,000 times.
• Developed Berkeley's upper-division data science class, which he now teaches to over 1000 students per semester.
Today’s episode will probably appeal primarily to hands-on data science practitioners but we made an effort to break down technical terms so that anyone who’s interested in staying on top of the latest in open-source Generative A.I. can enjoy the episode.
In it, Prof. Gonzalez details:
• How his headline-grabbing LLM, Vicuña, came to be and how it arose as one of the leading open-source alternatives to ChatGPT.
• How his Chatbot Arena became the leading proving ground for commercial and open-source LLMs alike.
• How his Gorilla project enables open-source LLMs to call APIs making it an open-source alternative to ChatGPT’s powerful plugin functionality.
• The race for longer LLM context windows.
• How both proprietary and open-source LLMs will thrive alongside each other in the coming years.
• His vision for how A.I. will have a massive, positive societal impact over the coming decades.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Large Language Model Leaderboards and Benchmarks
Llamas, Alpacas, Koalas, Falcons... there is a veritable zoo of LLMs out there! In today's episode, Caterina Constantinescu breaks down the LLM Leaderboards and evaluation benchmarks to help you pick the right LLM for your use case.
Caterina:
• Is a Principal Data Consultant at GlobalLogic, a full-lifecycle software development services provider with over 25,000 employees worldwide.
• Previously, she worked as a data scientist for financial services and marketing firms.
• Is a key player in data science conferences and Meetups in Scotland.
• Holds a PhD from The University of Edinburgh.
In this episode, Caterina details:
• The best leaderboards (e.g., HELM, Chatbot Arena and the Hugging Face Open LLM Leaderboard) for comparing the quality of both open-source and proprietary Large Language Models (LLMs).
• The advantages and issues associated with LLM evaluation benchmarks (e.g., evaluation dataset contamination is an big issue because the top-performing LLMs are often trained on all the publicly available data they can find... including benchmark-evaluation datasets).
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Jon’s “Generative A.I. with LLMs” Hands-on Training
Today's episode introduces my two-hour "Generative A.I with LLMs" training, which is packed with hands-on Python demos in Colab notebooks. It details open-source LLM (Hugging Face; PyTorch Lightning) and commercial (OpenAI API) options.
Read MoreLLaMA 2 — It’s Time to Upgrade your Open-Source LLM
If you've been using fine-tuned open-source LLMs (e.g. for generative A.I. functionality or natural-language conversations with your users), it's very likely time you switch your starting model over to Llama 2. Here's why:
Read MoreLossless LLM Weight Compression: Run Huge Models on a Single GPU
Many recent episodes have been focused on open-source Large Language Models that you can download and fine-tune to particular use cases depending on your needs or your users’ needs. I’ve particularly been highlighting LLMs with seven billion up to 13 billion model parameters because this size of model can typically be run on a single consumer GPU so it’s relatively manageable and affordable both to train and have in production.
Read MoreA.I. Accelerators: Hardware Specialized for Deep Learning
Today we’ve got an episode dedicated to the hardware we use to train and run A.I. models (particularly LLMs) such as GPUs, TPUs and AWS's Trainium and Inferentia chips. Ron Diamant may be the best guest on earth for this fascinating topic.
Ron:
• Works at Amazon Web Services (AWS) where he is Chief Architect for their A.I. Accelerator chips, which are designed specifically for training (and making inferences with) deep learning models.
• Holds over 200 patents across a broad range of processing hardware, including security chips, compilers and, of course, A.I. accelerators.
• Has been at AWS for nearly nine years – since the acquisition of the Israeli hardware company Annapurna Labs, where he served as an engineer and project manager.
• Holds a Masters in Electrical Engineering from Technion, the Israel Institute of Technology.
Today’s episode is on the technical side but doesn’t assume any particular hardware expertise. It’s primarily targeted at people who train or deploy machine learning models but might be accessible to a broader range of listeners who are curious about how computer hardware works.
In the episode, Ron details:
• CPUs versus GPUs.
• GPUs versus specialized A.I. Accelerators such as Tensor Processing Units (TPUs) and his own Trainium and Inferentia chips.
• The “AI Flywheel” effect between ML applications and hardware innovations.
• The complex tradeoffs he has to consider when embarking upon a multi-year chip-design project.
• When we get to Large Language Model-scale models with billions of parameters, the various ways we can split up training and inference over our available devices.
• How to get popular ML libraries like PyTorch and TensorFlow to interact optimally with A.I. accelerator chips.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Catch and Fix Harmful Generative A.I. Output
Today, the A.I. entrepreneur Krishna Gade joins me to detail open-source solutions for overcoming the safety and security issues associated with generative A.I. systems, such as those powered by Large Language Models (LLMs).
The remarkably well-spoken Krishna:
• Is Co-Founder and CEO of Fiddler AI, an observability platform that has raised over $45m in venture capital to build trust in A.I. systems.
• Previously worked as an engineering manager on Facebook’s Newsfeed, as Head of Data Engineering at Pinterest, and as a software engineer at both Twitter and Microsoft.
• Holds a Masters in Computer Science from the University of Minnesota.
In this episode, Krishna details:
• How the LLMs that enable Generative A.I. are prone to inaccurate statements, can be biased against protected groups and are susceptible to exposing private data.
• How these undesirable and even harmful LLM outputs can be identified and remedied with open-source solutions like the Fiddler Auditor that his team has built.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.