This week's guest is award-winning author Denis Rothman. He details how Transformer models (like GPT-3) have revolutionized Natural Language Processing (NLP) in recent years. He also explains Explainable AI (XAI).
Denis:
• Is the author of three technical books on artificial intelligence
• His most recent book, "Transformers for NLP", led him to win this year's Data Community Content Creator Award for technical book author
• Spent 25 years as co-founder of French A.I. company Planilog
• Has been patenting A.I. algos such as those for chatbots since 1982
In this episode, Denis fills us in on:
• What Natural Language Processing is
• What Transformer architectures are (e.g., BERT, GPT-3)
• Tools we can use to explain *why* A.I. algorithms provide a particular output
We covered audience questions from Serg, Chiara, and Jean-charles during filming. For those we didn't get to ask, Denis is kindly answering via a LinkedIn post today!
The episode's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Category: YouTube
Does Caffeine Hurt Productivity? (Part 1)
For Five-Minute Friday this week, I lay out my hypothesis that caffeine decreases people's capacity to focus deeply on work. Next week, we'll review the results of the months-long coffee experiment I ran on myself!
(If you can't wait to see the experiment results, you can head to jonkrohn.com/coffee to check them out.)
Listen or watch here.
Upcoming guest on the SuperDataScience Podcast: Wes McKinney
Next week, I'm interviewing the monumental Wes McKinney — creator of pandas, co-creator of Apache Arrow, and bestselling author of "Python for Data Analysis" — for a SuperDataScience episode.
Got Qs for him? Tweet them @jonkrohnlearns or send them to me on LinkedIn.
Advanced Partial Derivatives
This week's video builds on the preceding ones in my ML Foundations series to advance our understanding of partial derivatives by working through geometric examples. We use paper and pencil as well as Python code.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
Data Science for Private Investing — LIVE with Drew Conway
This week's guest is prominent data scientist and author Dr. Drew Conway. Working at Two Sigma, one of the world's largest hedge funds, Drew leads data science for private markets (e.g., real estate, private equity).
If you aren't familiar with Drew already, he:
• Serves as Senior Vice President for data science at Two Sigma
• Co-authored the classic O'Reilly Media book "ML for Hackers"
• Was co-founder and CEO of Alluvium, which was acquired in 2019
• Advised countless successful data-focused startups (e.g., Yhat, Reonomy)
• Obtained a PhD in politics from New York University
In this episode, he covers:
• What private investing is
• How data science can lead to better private investment decisions
• The differences between creating and executing models for public markets (such as stock exchanges) relative to private markets
• What he looks for in the data scientists he hires and how he interviews them
This is a special SuperDataScience episode because it's the first one recorded live in front of an audience (at the The New York R Conference in September). Eloquent Drew was the willing guinea pig for this experiment, which was a great success: We filmed in a single unbroken take and fielded excellent audience questions.
Listen or watch here.
Deep Reinforcement Learning
Five-Minute Friday today is an intro to (deep) reinforcement learning, which has diverse cutting-edge applications: E.g., machines defeating humans at complex strategic games and robotic hands solving Rubik’s cubes.
You can watch or listen here.
Calculating Partial Derivatives with PyTorch AutoDiff
My recent videos have detailed how to calculate partial derivatives by hand. In today's, I demo how we can compute them automatically using PyTorch, enabling us to easily differentiate complex equations like ML models.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
Accelerating Start-up Growth with A.I. Specialists
This week's guest is the game-changing Dr. Parinaz Sobhani. She leads ML at Georgian — a private fund that sends her "special ops" data science teams into its portfolio companies to accelerate their A.I. capabilities.
In this episode, Parinaz details:
• Case studies of Georgian's A.I. approach in action across industries (e.g. insurance, law, real estate)
• Tools and techniques her team leverages, with a particular focus on the transfer learning of transformer-based models of natural language
• What she looks for in the data scientists and ML engineers she hires
• Environmental and sociodemographic considerations of A.I.
• Her academic research (Parinaz holds a PhD in A.I. from the University of Ottawa where she specialized in natural language processing)
Listen or watch here.
...and thanks to Maureen for making this connection to Parinaz!
Building Your Ant Hill
Five-Minute Friday today features my 91-year-old grandmother sharing her insightful life philosophy that centers around an analogy of ants building ant hills.
Listen here.
Partial Derivative Exercises
Last week's YouTube tutorial was an epic intro to Partial Derivative Calculus — a critical foundation for understanding Machine Learning. This week's video features coding exercises that test your comprehension of that material.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
Bayesian Statistics
Expert Rob Trangucci joins me this week to provide an introduction to Bayesian Statistics, a uniquely powerful data-modeling approach.
If you haven't heard of Bayesian Stats before, today's episode introduces it from the ground up. It also covers why in many common situations, it can be more effective than other data-modeling approaches like Machine Learning and Frequentist Statistics.
Today's episode is a rich resource on:
• The centuries-old history of Bayesian Stats
• Its particular strengths
• Real-world applications, including to Covid epidemiology (Rob's particular focus at the moment)
• The best software libraries for applying Bayesian Statistics yourself
• Pros and cons of pursuing a PhD in the data science field
Rob is a core developer on the open-source STAN project — a leading Bayesian software library. Having previously worked as a statistician in renowned professor Andrew Gelman's lab at Columbia University in the City of New York, Rob's now pursuing a PhD in statistics at the University of Michigan.
Listen or watch here.
Supervised vs Unsupervised Learning
Five-Minute Friday this week is a high-level intro to the two largest categories of Machine Learning approaches: Supervised Learning and Unsupervised Learning.
Listen or watch here.
What Partial Derivatives Are
Here is my brand-new 30-minute intro to Partial Derivative Calculus. To make comprehension as easy as possible, we use colorful illustrations, hands-on code demos in Python, and an interactive click-and-point curve-plotting tool.
This is an epic video covering a massively foundational topic underlying nearly all statistical and machine learning approaches. I hope you enjoy it!
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
From Data Science to Cinema
SuperDataScience SuperStar Hadelin returns to report on his journey from multi-million-selling video instructor to mainstream-film actor — and he details the traits that allow data scientists to succeed at anything.
Hadelin has created and presented 30 extremely popular Udemy courses on machine learning topics, selling over two million copies so far. Prior to his epic creative period publishing ML courses, Hadelin studied math, engineering and A.I. at the Université Paris-Saclay and he worked as a data engineer at Google. More recently Hadelin has written a book called "A.I. Crash Course" and was co-founder and CEO of BlueLife AI.
Today's episode focuses on:
• Hadelin's recent shift toward acting in mainstream films
• The characteristics that enable an outstanding data scientist to excel in any pursuit
• How to cultivate your passion and achieve your dreams
• Bollywood vs Hollywood
• How to prepare for the TensorFlow Certificate Program
• Software modules for deploying deep learning models into production
Listen or watch here.
Classification vs Regression
Five-Minute Friday this week is a high-level introduction to Classification and Regression problems — two of the main categories of problems tackled by Machine Learning algorithms.
Listen or watch here.
Calculus II: Partial Derivatives & Integrals – Subject 4 of Machine Learning Foundations
Every few months, we begin a new subject in my Machine Learning Foundations course on YouTube and today is one of those days! This video introduces Subject 4 (of 8), which covers Partial Derivatives and Integrals.
This subject-intro video provides a preview of all the content that will be covered in this subject. It also reviews the Single-Variable Calculus you need to be familiar with (from the preceding subject in the ML Foundation series) in order to understand Partial Derivatives (a.k.a. Multi-Variable Calculus).
The thumbnail illustration of my ever-learning puppy Oboe is by the wonderful artist Aglae Bassens.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
Deep Reinforcement Learning for Robotics with Pieter Abbeel
Very special guest this week! Pieter Abbeel is a serial A.I. entrepreneur, host of star-studded The Robot Brains Podcast, and the world's most preeminent researcher of Deep Reinforcement Learning applications.
As a professor of Electrical Engineering and Computer Science at the University of California, Berkeley, Pieter directs the Berkeley Robot Learning Lab and co-directs the Berkeley A.I. Research Lab.
As an entrepreneur, he's been exceptionally successful at applying machine learning for commercial value. Gradescope, a machine learning company in the education technology space that he co-founded, was acquired in 2018. And the A.I. robotics firm Covariant, which he co-founded more recently, has raised $147 million so far, including raising $80 million in a Series C funding round in July.
In this episode, Pieter eloquently discusses:
• His exciting current research in the field of Deep Reinforcement Learning
• Top learning resources and skills for becoming an expert in A.I. robotics
• How academic robotics research is vastly different from R&D for industry
• Productivity tips
• Traits he looks for in data scientists he hires
• Skills to succeed as a data scientist in the coming decades
He also had time to answer thoughtful questions from distinguished SuperDataScience listeners Serg Masís and Hsieh-Yu Li.
Listen or watch here.
Managing Imposter Syndrome
The Five-Minute Friday episode this week is on Imposter Syndrome, including what it is and how to manage it.
Thanks to Nikolay for the episode idea and Micayla for doing most of my homework for it!
Listen or watch here.
Machine Learning from First Principles, with AutoDiff
Today's brand-new, epic 40-minute YouTube tutorial ties together the preceding 27 Calculus videos to enable us to perform Machine Learning from first principles and fit a line to data points.
To make learning interactive and intuitive, this video focuses on hands-on code demos featuring PyTorch, the popular Python library for Automatic Differentiation.
If you're familiar with differential calculus but not machine learning, this video will make clear for you how ML works. If you're not familiar with differential calculus, the preceding videos in my "Calculus for Machine Learning" course will provide you with all of the foundational theory you need for ML.
We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.
Statistical Programming with Friends with Jared Lander
This week's guest is THE Jared Lander! He fills us in on real-life communities that support learning about — and effectively applying — open-source statistical-programming languages like Python and R.
In addition, Jared:
• Overviews what data-science consulting is like (with fascinating use-cases from industrial metallurgy to "Money Ball"-ing for the Minnesota Vikings)
• Details the hard and soft skills of successful data-science consultants
• Ventures eloquently into the age-old R versus Python debate
Jared leads the New York Open Statistical Programming Meetup, which is the world's largest R meetup — but it also features other open-source programming languages like Python — for talks from global leaders in data science and machine learning. And Jared runs the R Conference, which is approaching its seventh annual iteration next week, Sep 9-10.
Jared also wrote the bestselling book "R for Everyone" and teaches stats at both Columbia University in the City of New York and Princeton University. And none of the massive responsibilities that I've just mentioned are Jared's day job! Nope, for that he's the CEO and Chief Data Scientist of Lander Analytics, a data-science consulting firm.
Watch or listen here.
P.S.: Jared is kindly providing 20% off admission to next week's R Conference off using promo code SDS20. See rstats.nyc for more details, including the first-ever live episode of SuperDataScience (with Drew Conway as guest)!