Filtering by Tag: python

Introducing the First Book in My A.I. Signature Series

Added on November 20, 2025 by Jon Krohn.

I'm delighted to announce that the first book in my "Pearson A.I. Signature Series" is "Building Agentic AI" by the prolific author Sinan Ozdemir... and it will be published on Sunday!

It's available for pre-order now worldwide from wherever you buy your books! You can also read a digital version in the O'Reilly platform today if you have access to it.

The book is packed with hands-on examples in Python and it allows you to master the complete agentic A.I. pipeline, including practical guidance and code on how to:

Design adaptive A.I. agents with memory, tool use, and collaborative reasoning capabilities.
Build robust RAG workflows using embeddings, vector databases and LangGraph state management.
Implement comprehensive evaluation frameworks beyond just "accuracy"
Deploy multimodal A.I. systems that seamlessly integrate text, vision, audio and code generation.
Optimize models for production through fine-tuning, quantization and speculative decoding techniques.
Navigate the bleeding edge of reasoning LLMs and computer-use capabilities.
Balance cost, speed, accuracy and privacy in real-world deployment scenarios.
Create hybrid architectures that combine multiple agents for complex enterprise applications.

Thanks to Debra Williams Cauley, Dayna Isley and many more at Pearson for bringing this series to life. The second book in the series will be available in December and I'll announce that shortly!

Pre-order the book here!

The Future of Python Notebooks is Here, with Marimo’s Dr. Akshay Agrawal

Added on August 5, 2025 by Jon Krohn.

I love Jupyter Notebooks... but they have a lot of painful "features". Today's guest Akshay Agrawal has built marimo, which resolves these issues and adds in lots of clever new innovations.

Causal AI, with Dr. Robert Usazuwa Ness

Added on July 29, 2025 by Jon Krohn.

Today's guest, Dr. Robert Osazuwa Ness, wrote the popular new book "Causal A.I." so enjoy this episode on what Causal A.I. is and what advantages it has over "normal" (correlation-based) models.

Robert:

• Senior Researcher at "Microsoft Research A.I."

• His research focuses on statistical and causal inference techniques for controllable, human-aligned multimodal models.

• He is also founder of Altdeep.ai, where he teaches professionals advanced topics in machine learning.

• Holds a PhD in Statistics from Purdue University in Indiana.

Today’s episode will resonate most with hands-on practitioners like data scientists, statisticians and A.I. engineers.

In today’s episode, Robert details:

• The three-rung ladder of causation that determines what types of causal questions you can actually answer with your data.

• The surprising connections between Bayesian networks, graphical models and modern causal A.I.

• Why A.I. systems have been dominated by correlation-based learning and what's stopping them from adopting causal reasoning like humans and animals naturally do.

• How tools like PyTorch, Pyro, and DoWhy are revolutionizing causal inference by separating statistical complexity from causal assumptions.

• How large language models like GPT-4o can act as "causal knowledge bases" and outperform traditional causal methods in some scenarios.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

My Four-Hour Agentic AI Workshop is Live and 100% Free

Added on June 22, 2025 by Jon Krohn.

In case you missed my post last week, my four-hour Agentic A.I. workshop (with Ed Donner, pictured) is live. 8,000 people have already watched it! Here's what they're saying:

Agentic AI Hands-On in Python: MCP, CrewAI and OpenAI Agents SDK (by Jon Krohn and Ed Donner)

Added on June 13, 2025 by Jon Krohn.

Now live! Four hours long and 100% free, this hands-on workshop covers all the Agentic A.I. theory and tools you need to develop and deploy multi-agent teams with Python.

Beautifully shot by a professional film crew (led by the exceptional Lucie McCormick) at the Open Data Science Conference (ODSC) East in Boston a few weeks ago and then meticulously edited by SuperDataScience's inimitable Mario Pombo, this training (within the GenAI-forward Cursor IDE) features all of today's essential agent frameworks:

OpenAI Agents SDK
CrewAI
Anthropic's Model Context Protocol (MCP)

From design considerations through to practical implementation tips, by completing all four modules in this video, you will have all the knowledge and skills needed to create effective multi-agent systems. The four modules are:

Defining Agents
Designing Agents
Developing Agents
The Future of Agents

The coding elements are led by the wonderful Ed Donner, whom many of you will already know as one of the very best in the world at creating and teaching hands-on A.I. content.

We received rave reviews for the session at ODSC East and the lecture hall was standing-room only for the entire duration, so I anticipate that you'll love it too!

Watch the full training here: youtu.be/LSk5KaEGVk4

Celebrating 5 Years with ODSC: An Award, A Workshop, and What’s Ahead

Added on May 22, 2025 by Jon Krohn.

Last week in Boston, the Open Data Science Conference (ODSC) surprised me with their "Speaker Impact Award" to recognize the years of training I've been providing at ODSC conferences.

Thank you Sheamus McGovern (pictured) and the whole ODSC team (Alex, Alina, Anna, Deepti, Elen, Paula, Ruby) for the honor and for putting on such stellar technical conferences.

I first lectured at ODSC New York in June 2019, when I provided a half-day workshop that introduced Deep Learning. (By great chance, the now-legendary Serg Masís emceed my session!)

Since then, I've enjoyed both ODSC East (held each spring in Boston) and ODSC West (held each autumn in San Francisco) most years, delivering (typically full-day) workshops on:

Deep Learning

The mathematical foundations of Machine Learning (e.g., linear algebra, partial-derivative calculus)
Training and deploying Large Language Models (with Lightning AI and Hugging Face)

This year at ODSC East, Ed Donner and I delivered a full-day training on developing and deploying Agentic A.I. featuring the open-source tools CrewAI, OpenAI Agents SDK, and Anthropic's Model Context Protocol (MCP). The session was jam-packed for the entire day and received rave reviews.

If you couldn't make it to Boston last week, I have good news for you! I hired a film crew to capture our entire Agentic A.I. training and am currently having the footage professionally edited. In the coming weeks (as soon as possible!), we'll be publishing this on YouTube so that it's freely available to everyone worldwide. Watch this space :)

Python Polars: The Definitive Guide, with Jeroen Janssens and Thijs Nieuwdorp

Added on May 6, 2025 by Jon Krohn.

Today's episode on Polars is in equal parts hilarious and informative with Jeroen and Thijs, who co-authored the brand-new O'Reilly book "Python Polars: The Definitive Guide". Enjoy this one!

Polars: Past, Present and Future, with Polars Creator Ritchie Vink

Added on October 15, 2024 by Jon Krohn.

Because of it's stunningly fast speed, Polars is an extremely popular open-source library for DataFrame operations in Python. Kinda unreal to have Ritchie Vink, Polars' creator, as today's guest!

Ritchie:

• Is CEO and Co-Founder of Polars, Inc., a startup that has raised $4m in seed funding to support his Polars open-source project.

• Previously worked as an ML Engineer, Data Scientist and Data Engineer at companies like adidas and KLM Royal Dutch Airlines.

• Holds a Master’s in Structural Engineering and worked as a civil engineer prior to catching the data-science bug.

Today’s episode will appeal most to hands-on practitioners like data scientists and ML engineers. In it, Ritchie details:

• How Polars regularly achieves 5-20x (sometimes 100x!) speed improvements over Pandas for most DataFrame operations.

• The Eager and Lazy execution APIs Polars offers and when you should use one or the other.

• Ritchie's vision for scaling Polars to handle massive distributed datasets.

• How we can continue to make data-processing efficiency gains even as Moore's Law slows down.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

DataFrame Operations 100x Faster than Pandas, with Marco Gorelli

Added on September 3, 2024 by Jon Krohn.

Today's episode is all about Polars — the hot library for Python that offers up to 100x speedups for DataFrame operations relative to pandas. Marco Gorelli, a core Polars developer, is our gifted guide.

Marco is a tremendously talented communicator of complex technical topics, making him the perfect guest for this highly technical episode. He:

• Is a core developer of the popular Python libraries pandas and Polars.

• Is the creator of the Narwhals library.

• Has spoken at several major Python conferences (such as PyData), taught Polars professionally, and wrote the first complete Polars plugins tutorial.

• Currently works as Senior Software Engineer at Quansight Labs.

• Previously, worked as a data scientist and was one of the prize winners (from amongst >100,000 entrants!) of the M6 forecasting competition.

• Holds a Master’s in Mathematics and the Foundations of Computer Science from the University of Oxford.

Today’s episode will appeal primarily to hands-on technical folks like data scientists, ML engineers and software developers.

In today’s episode, Marco details:

• What the hot, fast-growing Polars library for working with DataFrames in Python is (it already has 65m downloads and 28k GitHub stars).

• How Polars offers up to 100x speed-ups relative to Pandas on DataFrame operations.

• How the lightweight, dependency-free Narwhals package he created allows for easy compatibility between different DataFrame libraries such as Polars and Pandas.

• How he got addicted to open-source development.

• The simple trick he used to be a prize-winner in super-popular forecasting competitions.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Bayesian Methods and Applications, with Alexandre Andorra

Added on June 18, 2024 by Jon Krohn.

Is he a man or a country? Find out in today's episode with Alexandre Andorra — developer of the leading Bayesian library for Python, implementer of commercial Bayesian models and leading Bayesian educator/podcaster!

In Case You Missed It in April 2024

Added on May 11, 2024 by Jon Krohn.

Other than excessive maleness and paleness*, April 2024 was an excellent month for the podcast, packed with outstanding guests. ICYMI, today's episode highlights the most fascinating moments of my convos with them.

Specifically, conversation highlights include:

1. Iconic open-source developer Dr. Hadley Wickham putting the "R vs Python" argument to bed.

2. Aleksa Gordić, creator of a digital A.I.-learning community of 160k+ people, on the movement from formal to self-directed education.

3. World-leading futurist Bernard Marr on how we can work with A.I. as opposed to it lording over of us.

4. Educator of millions of data scientists, Kirill Eremenko, on why gradient boosting is so powerful for making informed business decisions.

5. Prof. Barrett Thomas on how drones could transform same-day delivery.

*Remedied in May!

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

NumPy, SciPy and the Economics of Open-Source, with Dr. Travis Oliphant

Added on March 12, 2024 by Jon Krohn.

Huge episode today with iconic Dr. Travis Oliphant, creator of NumPy and SciPy, the standard libraries for numeric operations (downloaded 8 million and 3 million times PER DAY, respectively). Hear about the future of open-source, including the impact of GenAI.

Brewing Beer with A.I., with Beau Warren

Added on February 6, 2024 by Jon Krohn.

In today's episode, Beau Warren of the innovative "Species X" brewery, details how we collaborated together on an A.I. model to craft the perfect beer. Dubbed "Krohn&Borg" lager, you can join us in Columbus, Ohio on Thursday night to try it yourself! 🍻

Pandas for Data Analysis and Visualization

Added on May 2, 2023 by Jon Krohn.

Today's episode is jam-packed with practical tips on using the Pandas library in Python for data analysis and visualization. Super-sharp Stefanie Molin — a bestselling author and sought-after instructor on these topics — is our guide.

Stefanie:
• Is the author of the bestselling book "Hands-On Data Analysis with Pandas".
• Provides hands-on pandas and data viz tutorials at top industry conferences.
• Is a software engineer and data scientist at Bloomberg, the financial data giant, where she tackles problems revolving around data wrangling/visualization and building tools for gathering data.
• Holds a degree in operations research from Columbia University as well as a masters in computer science, with an ML specialization, from Georgia Tech.

Today’s episode is intended primarily for hands-on practitioners like data analysts, data scientists, and ML engineers — or anyone that would like to be in a technical data role like these in the future.

In this episode, Stefanie details:
• Her top tips for wrangling data in pandas.
• In what data viz circumstances you should use pandas, matplotlib, or Seaborn.
• Why everyone who codes, including data scientists, should develop expertise in Python package creation as well as contribute to open-source projects.
• The tech stack she uses in her role at Bloomberg.
• The productivity tips she honed by simultaneously working full-time, completing a masters degree and writing a bestselling book.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Simplifying Machine Learning

Added on December 27, 2022 by Jon Krohn.

Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.

Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.

Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.

In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Software for Efficient Data Science

Added on November 22, 2022 by Jon Krohn.

In today's episode, Dr. Jodie Burchell details a broad range of tools for working efficiently with data, including data cleaning, reproducibility, visualization, and natural language processing.

Jodie:
• Is the Data Science Developer Advocate for JetBrains, the developer-tools company behind PyCharm (one of the most widely-used Python IDEs) and DataLore (their new cloud platform for collaborative data science).
• Previously was Data Scientist or Lead Data Scientist at several tech companies, developing specializations in search, recommender systems, and NLP.
• Co-authored two books on data visualization libraries: "The Hitchhiker's Guide to ggplot2" and "The Hitchhiker's Guide to Plotnine".
• Prior to entering industry, was a postdoctoral fellow in biostatistics at the University of Melbourne.
• Holds a PhD in Psychology from the Australian National University.

Today’s episode is primarily intended for a technical audience as it's packed with practical tips and software for data scientists.

In this episode, Jodie details:
• What a data science developer advocate is and why you might want to consider it as a career option.
• How to work effectively, efficiently, and confidently with real-world data.
• Her favorite Python libraries, such as ones for data viz and NLP.
• How to have reproducible data science workflows.
• The subject she would have majored in if she could go back in time.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Data Science Interviews with Nick Singh

Added on October 4, 2022 by Jon Krohn.

For an episode all about tips for crushing interviews for Data Scientist roles, our guest is Nick Singh — author of the bestselling "Ace the Data Science Interview" book and creator of the DataLemur SQL interview platform.

Nick:
• Co-authored “Ace the Data Science Interview”, an interview-question guide that has sold over 16,000 copies since it was released last year.
• Created the DataLemur platform for interactively practicing interview questions involving SQL queries.
• Worked as a software engineer at Facebook, Google, and Microsoft.
• Holds a BS in engineering from the University of Virginia.

Today's episode is ideal for folks who are looking to land a data science job for the first time, level-up into a more senior data science role, or perhaps land a data science gig at a new firm.

In this episode, Nick details:
• His top tips for success in data science interviews.
• Common misconceptions about data science interviews.
• How to become comfortable with self-promotion and increase your chances of landing your dream job.
• Strategies for when interviewers ask if you have any questions for them.
• The subject areas and skills you should master before heading into a data science interview.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Automating ML Model Deployment

Added on May 16, 2022 by Jon Krohn.

Relative to training a machine learning model, getting it into production typically takes multiple times as much time and effort. Dr Doris Xin, the brilliant co-founder/CEO of Linea, has a near-magical, two-line solution.

In the episode, Doris details:
• How Linea reduces ML model deployment to two lines of Python code.
• The surprising extent of wasted computation she discovered when she analyzed over 3000 production pipelines at Google.
• Her experimental evidence that the total automation of ML model development is neither realistic nor desirable.
• What it’s like being the CEO of an exciting, early-stage tech start-up.
• Where she sees the field of data science going in the coming years and how you can prepare for it.

Today’s episode is more on the technical side so will likely appeal primarily to practicing data scientists, especially those that need to — or are interested in — deploying ML models into production.

Doris:
• Is co-founder and CEO of Linea, an early start-up that dramatically simplifies the deployment of machine learning models into production.
• Her alpha users include the likes of Twitter, Lyft, and Pinterest.
• Her start-up’s mission was inspired by research she conducted as a PhD student in computer science at the University of California, Berkeley.
• Previously she worked in research and software engineering roles at Google, Microsoft, Databricks, and LinkedIn.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Probability & Information Theory — Subject 5 of Machine Learning Foundations

Added on March 28, 2022 by Jon Krohn.

Last Wednesday, we released the final video of my Calculus course, so today we begin my all-new YouTube course on Probability and Information Theory. This first video is an orientation to the course curriculum, enjoy!

We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

My Favorite Calculus Resources

Added on March 21, 2022 by Jon Krohn.

It's my birthday today! In celebration, I'm delighted to be releasing the final video of my "Calculus for Machine Learning" YouTube course. The first video came out in May and now, ten months later, we're done! 🎂

We published a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday since May 6th, 2021. So happy that it's now complete for you to enjoy. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Probability, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

Starting next Wednesday, we'll begin releasing videos for a new YouTube course of mine: "Probability for Machine Learning". Hope you're excited to get going on it :)