In case you missed my post last week, my four-hour Agentic A.I. workshop (with Ed Donner, pictured) is live. 8,000 people have already watched it! Here's what they're saying:
Read MoreFiltering by Tag: python
Agentic AI Hands-On in Python: MCP, CrewAI and OpenAI Agents SDK (by Jon Krohn and Ed Donner)
Now live! Four hours long and 100% free, this hands-on workshop covers all the Agentic A.I. theory and tools you need to develop and deploy multi-agent teams with Python.
Beautifully shot by a professional film crew (led by the exceptional Lucie McCormick) at the Open Data Science Conference (ODSC) East in Boston a few weeks ago and then meticulously edited by SuperDataScience's inimitable Mario Pombo, this training (within the GenAI-forward Cursor IDE) features all of today's essential agent frameworks:
OpenAI Agents SDK
CrewAI
Anthropic's Model Context Protocol (MCP)
From design considerations through to practical implementation tips, by completing all four modules in this video, you will have all the knowledge and skills needed to create effective multi-agent systems. The four modules are:
Defining Agents
Designing Agents
Developing Agents
The Future of Agents
The coding elements are led by the wonderful Ed Donner, whom many of you will already know as one of the very best in the world at creating and teaching hands-on A.I. content.
We received rave reviews for the session at ODSC East and the lecture hall was standing-room only for the entire duration, so I anticipate that you'll love it too!
Watch the full training here: youtu.be/LSk5KaEGVk4
Celebrating 5 Years with ODSC: An Award, A Workshop, and What’s Ahead
Last week in Boston, the Open Data Science Conference (ODSC) surprised me with their "Speaker Impact Award" to recognize the years of training I've been providing at ODSC conferences.
Thank you Sheamus McGovern (pictured) and the whole ODSC team (Alex, Alina, Anna, Deepti, Elen, Paula, Ruby) for the honor and for putting on such stellar technical conferences.
I first lectured at ODSC New York in June 2019, when I provided a half-day workshop that introduced Deep Learning. (By great chance, the now-legendary Serg Masís emceed my session!)
Since then, I've enjoyed both ODSC East (held each spring in Boston) and ODSC West (held each autumn in San Francisco) most years, delivering (typically full-day) workshops on:
Deep Learning
The mathematical foundations of Machine Learning (e.g., linear algebra, partial-derivative calculus)
Training and deploying Large Language Models (with Lightning AI and Hugging Face)
This year at ODSC East, Ed Donner and I delivered a full-day training on developing and deploying Agentic A.I. featuring the open-source tools CrewAI, OpenAI Agents SDK, and Anthropic's Model Context Protocol (MCP). The session was jam-packed for the entire day and received rave reviews.
If you couldn't make it to Boston last week, I have good news for you! I hired a film crew to capture our entire Agentic A.I. training and am currently having the footage professionally edited. In the coming weeks (as soon as possible!), we'll be publishing this on YouTube so that it's freely available to everyone worldwide. Watch this space :)
Python Polars: The Definitive Guide, with Jeroen Janssens and Thijs Nieuwdorp
Today's episode on Polars is in equal parts hilarious and informative with Jeroen and Thijs, who co-authored the brand-new O'Reilly book "Python Polars: The Definitive Guide". Enjoy this one!
More on Dr. Jeroen Janssens:
• Senior Developer Relations Engineer at Posit PBC (iconic creators of RStudio and much more).
• Previously, was Senior Machine Learning Engineer at Xomnia.
• Wrote the invaluable O’Reilly book "Data Science at the Command Line".
• Holds a PhD in machine learning from Tilburg University.
...and on Thijs Nieuwdorp:
• Lead Data Scientist at Xomnia, the largest Dutch data and A.I. consulting company.
• Holds a degree in A.I. from Radboud University.
Today’s episode will be particularly appealing to hands-on data science, machine learning and A.I. practitioners but Jeroen and Thijs are tremendous storytellers and frankly very funny so this episode can probably be enjoyed by anyone interested in data and A.I.
In today’s episode, Jeroen and Thijs detail:
• Why pandas users are rapidly switching to Polars for dataframe operations in Python.
• The inside story of how O'Reilly rejected four book proposals on Polars before accepting the fifth.
• The moment when an innocuous GitHub pull request forced a complete rewrite of an entire book chapter.
• A previously secret collaboration with NVIDIA and Dell that revealed remarkable GPU acceleration benchmarks by Polars.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Polars: Past, Present and Future, with Polars Creator Ritchie Vink
Because of it's stunningly fast speed, Polars is an extremely popular open-source library for DataFrame operations in Python. Kinda unreal to have Ritchie Vink, Polars' creator, as today's guest!
Ritchie:
• Is CEO and Co-Founder of Polars, Inc., a startup that has raised $4m in seed funding to support his Polars open-source project.
• Previously worked as an ML Engineer, Data Scientist and Data Engineer at companies like adidas and KLM Royal Dutch Airlines.
• Holds a Master’s in Structural Engineering and worked as a civil engineer prior to catching the data-science bug.
Today’s episode will appeal most to hands-on practitioners like data scientists and ML engineers. In it, Ritchie details:
• How Polars regularly achieves 5-20x (sometimes 100x!) speed improvements over Pandas for most DataFrame operations.
• The Eager and Lazy execution APIs Polars offers and when you should use one or the other.
• Ritchie's vision for scaling Polars to handle massive distributed datasets.
• How we can continue to make data-processing efficiency gains even as Moore's Law slows down.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
DataFrame Operations 100x Faster than Pandas, with Marco Gorelli
Today's episode is all about Polars — the hot library for Python that offers up to 100x speedups for DataFrame operations relative to pandas. Marco Gorelli, a core Polars developer, is our gifted guide.
Marco is a tremendously talented communicator of complex technical topics, making him the perfect guest for this highly technical episode. He:
• Is a core developer of the popular Python libraries pandas and Polars.
• Is the creator of the Narwhals library.
• Has spoken at several major Python conferences (such as PyData), taught Polars professionally, and wrote the first complete Polars plugins tutorial.
• Currently works as Senior Software Engineer at Quansight Labs.
• Previously, worked as a data scientist and was one of the prize winners (from amongst >100,000 entrants!) of the M6 forecasting competition.
• Holds a Master’s in Mathematics and the Foundations of Computer Science from the University of Oxford.
Today’s episode will appeal primarily to hands-on technical folks like data scientists, ML engineers and software developers.
In today’s episode, Marco details:
• What the hot, fast-growing Polars library for working with DataFrames in Python is (it already has 65m downloads and 28k GitHub stars).
• How Polars offers up to 100x speed-ups relative to Pandas on DataFrame operations.
• How the lightweight, dependency-free Narwhals package he created allows for easy compatibility between different DataFrame libraries such as Polars and Pandas.
• How he got addicted to open-source development.
• The simple trick he used to be a prize-winner in super-popular forecasting competitions.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Bayesian Methods and Applications, with Alexandre Andorra
Is he a man or a country? Find out in today's episode with Alexandre Andorra — developer of the leading Bayesian library for Python, implementer of commercial Bayesian models and leading Bayesian educator/podcaster!
More on Alex:
• Co-Founder and Principal Data Scientist at PyMC Labs, a firm that develops PyMC (the leading Python library for Bayesian statistics) and consults with their clients to implement profit-increasing Bayesian models.
• Co-Founder and Instructor at an online learning platform called Intuitive Bayes that provides free Bayesian stats education.
• Creator and Host of an excellent podcast called Learning Bayesian Statistics.
Today’s episode will probably appeal most to hands-on practitioners like statisticians, data scientists and machine learning engineers, but the episode also serves as an introduction to Bayesian statistics for anyone who’d like to learn about this important, unique and powerful field.
In today’s episode, Alex details:
• What Bayesian statistics is.
• The situations where Bayesian stats can solve problems that no other approach can.• Resources for learning Bayesian stats.
• The key Python libraries for implementing Bayesian models yourself.
• How Gaussian Processes can be incorporated into a Bayesian framework in order to allow for especially advanced and flexible models.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in April 2024
Other than excessive maleness and paleness*, April 2024 was an excellent month for the podcast, packed with outstanding guests. ICYMI, today's episode highlights the most fascinating moments of my convos with them.
Specifically, conversation highlights include:
1. Iconic open-source developer Dr. Hadley Wickham putting the "R vs Python" argument to bed.
2. Aleksa Gordić, creator of a digital A.I.-learning community of 160k+ people, on the movement from formal to self-directed education.
3. World-leading futurist Bernard Marr on how we can work with A.I. as opposed to it lording over of us.
4. Educator of millions of data scientists, Kirill Eremenko, on why gradient boosting is so powerful for making informed business decisions.
5. Prof. Barrett Thomas on how drones could transform same-day delivery.
*Remedied in May!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
NumPy, SciPy and the Economics of Open-Source, with Dr. Travis Oliphant
Huge episode today with iconic Dr. Travis Oliphant, creator of NumPy and SciPy, the standard libraries for numeric operations (downloaded 8 million and 3 million times PER DAY, respectively). Hear about the future of open-source, including the impact of GenAI.
More on Travis:
• Founded Anaconda, Inc., the company behind the also-ubiquitous Python package manager.
• Founded the massive PyData conferences and communities as well as its associated non-profit foundation, NumFOCUS.
• Currently serves as the CEO of two firms: OpenTeams and Quansight.
• Holds a PhD in biomedical engineering from the Mayo Clinic in Minnesota.
Today’s episode will primarily be of interest to hands-on practitioners like data scientists, software developers and machine learning engineers.
In it, Travis details:
• How his journey creating open-source software began and how NumPy and SciPy grew to become the most popular foundational Python libraries for working with data.
• How he identifies commercial opportunities to support his vast open-source efforts and communities.
• How AI, particularly generative AI, is transforming open-source development.
• Where open-source innovation is headed in the years to come.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Brewing Beer with A.I., with Beau Warren
In today's episode, Beau Warren of the innovative "Species X" brewery, details how we collaborated together on an A.I. model to craft the perfect beer. Dubbed "Krohn&Borg" lager, you can join us in Columbus, Ohio on Thursday night to try it yourself! 🍻
Read MorePandas for Data Analysis and Visualization
Today's episode is jam-packed with practical tips on using the Pandas library in Python for data analysis and visualization. Super-sharp Stefanie Molin — a bestselling author and sought-after instructor on these topics — is our guide.
Stefanie:
• Is the author of the bestselling book "Hands-On Data Analysis with Pandas".
• Provides hands-on pandas and data viz tutorials at top industry conferences.
• Is a software engineer and data scientist at Bloomberg, the financial data giant, where she tackles problems revolving around data wrangling/visualization and building tools for gathering data.
• Holds a degree in operations research from Columbia University as well as a masters in computer science, with an ML specialization, from Georgia Tech.
Today’s episode is intended primarily for hands-on practitioners like data analysts, data scientists, and ML engineers — or anyone that would like to be in a technical data role like these in the future.
In this episode, Stefanie details:
• Her top tips for wrangling data in pandas.
• In what data viz circumstances you should use pandas, matplotlib, or Seaborn.
• Why everyone who codes, including data scientists, should develop expertise in Python package creation as well as contribute to open-source projects.
• The tech stack she uses in her role at Bloomberg.
• The productivity tips she honed by simultaneously working full-time, completing a masters degree and writing a bestselling book.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Simplifying Machine Learning
Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.
Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.
Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.
In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Software for Efficient Data Science
In today's episode, Dr. Jodie Burchell details a broad range of tools for working efficiently with data, including data cleaning, reproducibility, visualization, and natural language processing.
Jodie:
• Is the Data Science Developer Advocate for JetBrains, the developer-tools company behind PyCharm (one of the most widely-used Python IDEs) and DataLore (their new cloud platform for collaborative data science).
• Previously was Data Scientist or Lead Data Scientist at several tech companies, developing specializations in search, recommender systems, and NLP.
• Co-authored two books on data visualization libraries: "The Hitchhiker's Guide to ggplot2" and "The Hitchhiker's Guide to Plotnine".
• Prior to entering industry, was a postdoctoral fellow in biostatistics at the University of Melbourne.
• Holds a PhD in Psychology from the Australian National University.
Today’s episode is primarily intended for a technical audience as it's packed with practical tips and software for data scientists.
In this episode, Jodie details:
• What a data science developer advocate is and why you might want to consider it as a career option.
• How to work effectively, efficiently, and confidently with real-world data.
• Her favorite Python libraries, such as ones for data viz and NLP.
• How to have reproducible data science workflows.
• The subject she would have majored in if she could go back in time.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Science Interviews with Nick Singh
For an episode all about tips for crushing interviews for Data Scientist roles, our guest is Nick Singh — author of the bestselling "Ace the Data Science Interview" book and creator of the DataLemur SQL interview platform.
Nick:
• Co-authored “Ace the Data Science Interview”, an interview-question guide that has sold over 16,000 copies since it was released last year.
• Created the DataLemur platform for interactively practicing interview questions involving SQL queries.
• Worked as a software engineer at Facebook, Google, and Microsoft.
• Holds a BS in engineering from the University of Virginia.
Today's episode is ideal for folks who are looking to land a data science job for the first time, level-up into a more senior data science role, or perhaps land a data science gig at a new firm.
In this episode, Nick details:
• His top tips for success in data science interviews.
• Common misconceptions about data science interviews.
• How to become comfortable with self-promotion and increase your chances of landing your dream job.
• Strategies for when interviewers ask if you have any questions for them.
• The subject areas and skills you should master before heading into a data science interview.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Automating ML Model Deployment
Relative to training a machine learning model, getting it into production typically takes multiple times as much time and effort. Dr Doris Xin, the brilliant co-founder/CEO of Linea, has a near-magical, two-line solution.
In the episode, Doris details:
• How Linea reduces ML model deployment to two lines of Python code.
• The surprising extent of wasted computation she discovered when she analyzed over 3000 production pipelines at Google.
• Her experimental evidence that the total automation of ML model development is neither realistic nor desirable.
• What it’s like being the CEO of an exciting, early-stage tech start-up.
• Where she sees the field of data science going in the coming years and how you can prepare for it.
Today’s episode is more on the technical side so will likely appeal primarily to practicing data scientists, especially those that need to — or are interested in — deploying ML models into production.
Doris:
• Is co-founder and CEO of Linea, an early start-up that dramatically simplifies the deployment of machine learning models into production.
• Her alpha users include the likes of Twitter, Lyft, and Pinterest.
• Her start-up’s mission was inspired by research she conducted as a PhD student in computer science at the University of California, Berkeley.
• Previously she worked in research and software engineering roles at Google, Microsoft, Databricks, and LinkedIn.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Probability & Information Theory — Subject 5 of Machine Learning Foundations
Last Wednesday, we released the final video of my Calculus course, so today we begin my all-new YouTube course on Probability and Information Theory. This first video is an orientation to the course curriculum, enjoy!
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
My Favorite Calculus Resources
It's my birthday today! In celebration, I'm delighted to be releasing the final video of my "Calculus for Machine Learning" YouTube course. The first video came out in May and now, ten months later, we're done! 🎂
We published a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday since May 6th, 2021. So happy that it's now complete for you to enjoy. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Probability, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
Starting next Wednesday, we'll begin releasing videos for a new YouTube course of mine: "Probability for Machine Learning". Hope you're excited to get going on it :)
Effective Pandas
Seven-time bestselling author Matt Harrison reveals his top tips and tricks to enable you to get the most out of Pandas, the leading Python data analysis library. Enjoy!
Matt's books, all of which have been Amazon best-sellers, are:
1. Effective Pandas
2. Illustrated Guide to Learning Python 3
3. Intermediate Python
4. Learning the Pandas Library
5. Effective PyCharm
6. Machine Learning Pocket Reference
7. Pandas Cookbook (now in its second edition)
Beyond being a prolific author, Matt:
• Teaches "Exploratory Data Analysis with Python" at Stanford
• Has taught Python at big organizations like Netflix and NASA
• Has worked as a CTO and Senior Software Engineer
• Holds a degree in Computer Science from Stanford University
On top of Matt's tips for effective Pandas programming, we cover:
• How to squeeze more data into Pandas on a given machine.
• His recommended software libraries for working with tabular data once you have too many data to fit on a single machine.
• How having a computer science education and having worked as a software engineer has been helpful in his data science career.
This episode will appeal primarily to practicing data scientists who are keen to learn about Pandas or keen to become an even deeper expert on Pandas by learning from a world-leading educator on the library.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Finding the Area Under the ROC Curve
In this week's tutorial, we use Python code to find the area under the curve of the receiver operating characteristic (the "ROC curve"). This is a machine learning-specific application of integral calculus.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
This is the penultimate video in my Calculus course! After ten months of publishing it, the final video will be released next week :)
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
Definite Integral Exercise
My recent videos have covered how to find Definite Integrals manually as well as how to find them computationally using Python code. This week's video is an exercise that tests comprehension of both approaches.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.