Today’s episode isn’t specifically about GPT-3, however. It’s about the issue of how massive these large language models are and how we can prune these models to compress them.
Read MoreFiltering by Category: Data Science
Introduction to Machine Learning
After a multi-year hiatus, Hadelin and Kirill — the most popular data science instructors on Udemy, with 2+ million students — have released a new ML course. In this episode, they introduce what ML is from scratch.
Kirill Eremenko:
• Is Founder and CEO of SuperDataScience, an e-learning platform.
• Founded the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins two years ago.
Hadelin de Ponteves:
• Was a data engineer at Google before becoming a content creator.
• In 2020, took a break from Data Science content to produce and star in a Bollywood film featuring "Miss Universe" Harnaaz Sandhu.
Together, Kirill and Hadelin:
• Have created dozens of data science courses.
• Are the most popular data science instructors on the Udemy platform, with over two million students.
• After a multi-year hiatus from creating courses, they recently published a new course called “Machine Learning in Python: Level 1".
This episode serves as an introduction to machine learning so will primarily appeal to folks who aren’t already expert at ML — that said, I’ve been doing ML for over 15 years and still learned a few critical new pieces of information during filming so this episode could serve as a fun, light-hearted refresher for experts.
In this episode, Kirill and Hadelin introduce ML concepts such as:
• Supervised vs unsupervised learning
• Classification errors
• Logistic regression
• Feature scaling
• The Adjusted R-Squared metric
• The assumptions of linear regression
• The Elbow Method
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Is Data Science Still Sexy?
Had far too much fun filming today's episode with Prof. Tom Davenport, many-time author of bestselling books on analytics and coiner of data science as "sexiest job of the century". A decade on, does he still think so?
Tom:
• Has published over 20 books, such as the bestselling "Competing on Analytics", "The A.I. Advantage", and "Analytics at Work".
• Has penned 300+ articles in publications like the Harvard Business Review and writes regular columns for Forbes and The Wall Street Journal.
• Is President's Distinguished Professor of IT and Management at Babson College.
• Is Visiting Professor at the Saïd Business School, University of Oxford.
• Is Senior Advisor to the A.I. practice for the global professional services giant Deloitte.
• With nearly 300k followers, he’s recognized as a LinkedIn Top Voice.
Today’s episode is equally well-suited to technical and non-technical listeners alike. Every part of it should be appealing to anyone who’s keen to hear about the leading edge of commercial applications of A.I.
In this episode, Prof. Davenport details:
• The discrete A.I. maturity levels of organizations.
• How organizations become A.I. fueled.
• Which jobs are susceptible to replacement by A.I.
• Which jobs are ripe for augmenting with A.I.
• What roles other than data scientist are required to deploy effective machine learning models.
• What the future of data science will look like and, having coined data science as “the sexiest job of the 21st century” a decade ago, whether he still thinks it is today.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Machine Learning for Video Games
Carly Taylor — Lead ML Engineer for the "Call of Duty" franchise — joined me for today's fun, super informative episode on low-latency software engineering, real-time ML, and the future of gaming.
Carly:
• Grew rapidly from a Sr Data Scientist role to simultaneously holding "Expert ML Engineer" and "Sr Mgr — Security Strategy" titles since joining Activision two years ago.
• At Activision, specifically works on Call of Duty, one of the top-grossing video game franchises of all time, with over $30 billion in sales and 250m global users annually.
• Prior to Activision, rapidly grew from Analyst to Data Scientist roles.
• Has amassed a LinkedIn following of 75k+ by regularly posting fruitful tips on breaking into a data science career and progressing within it.
• Advocates for women in STEM, tech, and gaming careers.
• Offers 1:1 career consulting to anyone who desires it.
• Holds a Masters in Computational Chemistry from the University of Colorado and completed the Galvanize Data Science Immersive program.
Today’s episode certainly has technical tidbits throughout that will be useful to hands-on practitioner but much of the wide-ranging conversation will be fascinating to any listener, particularly if you have an interest in video games, the so-called metaverse, or real-time machine learning.
In this episode, Carly details:
• What the future of gaming holds.
• Why low-latency is critical for an optimal gaming experience and the tools that online engineers use to make it happen.
• Her favorite operating systems, software packages, and keyboards.
• How to transition effectively from a quantitative academic background into data science.
• How to file a patent.
• Why she’s called the “Rebel Data Scientist”.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A.I. for Medicine
Machine learning is ushering in a new era of medicine, e.g., by predicting the shape of therapeutic drugs and assisting in their design. Witty Prof. Charlotte Deane of the University of Oxford and Exscientia explains how.
Charlotte:
• Is a global-leading expert on using ML for designing therapeutic drugs.
• Has been faculty at the University of Oxford for over 20 years, where serves as Professor of Structural Bioinformatics and heads the 25-person Protein Informatics Lab.
• Is Chief Scientist Biologics A.I. at Exscientia, a NASDAQ-listed pharmatech company that uses computational approaches to drive drug development in a fraction of the time of traditional drug companies.
• Was COVID-Response Director for UK Research and Innovation, resulting in Queen Elizabeth II honoring her as a Member of the Most Excellent Order of the British Empire.
Today’s episode should appeal to technical and non-technical folks alike as it features an absolutely brilliant scientist and communicator describing how we can use A.I. to speed the discovery of new molecules that help our body fight off ailments as diverse as viruses and cancer.
In this episode, Prof. Deane details:
• How your immune system works.
• What biologics are and why they’re such an important class of drugs.
• What’s holding back the widespread use of precision medicines that are pinpoint-customized to a specific tumor in a specific person.
• What the celebrated AlphaFold algorithm does exquisitely and where it (and all other computational models of protein folding) still need to improve.
• How she used data to marshall the UK’s scientific response to Covid.
• How data and machine learning will transform drug development over the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Science Trends for 2023
Happy New Year! To kick it off, the entrepreneur, futurist, and mega-popular Machine Learning instructor Sadie St. Lawrence joins me to predict the biggest data science trends of 2023 🍾
We start the episode off by looking back at how our predictions for 2022 panned out from a year ago and then we dive into our predictions for the year ahead. Specific trends we discuss include:
• Data as a product
• Multimodal models
• Decentralization of enterprise data
• A.I. policy
• Environmental sustainability
This episode will appeal to technical and non-technical folks alike — anyone who’d like to understand the trends that will shape the field of data science and the broader world not only in 2023 but also in the years beyond.
Sadie:
• Has created data science and ML courses enjoyed by 350k+ students.
• Is Founder and CEO of Women In Data, a community of over 20k women across 17 countries.
• Serves on multiple start-up boards.
• Hosts the Data Bytes podcast.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Simplifying Machine Learning
Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.
Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.
Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.
In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Influence Others with Your Data
If you ever use data to make decisions or to persuade those around you to make data-driven decisions, today’s episode is jam-packed with relevant, practical tips from data presentation guru Ann K. Emery.
Ann:
• Is an internationally-acclaimed speaker who delivers 100+ keynotes, workshops, and webinars each year to enable people to share data-driven insights more effectively.
• She has consulted on data visualization, data reporting, and data presentation with over 200 organizations — the likes of the United Nations, the US Centers for Disease Control, and Harvard University.
• She holds a BA in Psychology and Spanish from the University of Virginia and a Masters in Educational Psychology Evaluation, Assessment, and Testing from George Mason University.
I rarely say that everyone should listen to an episode, but this is one of those rare cases.
In this episode, Ann details:
• What data storytelling is.
• Best practices for data visualization.
• Surprising tricks you can pull off with spreadsheet software.
• How to report on data effectively.
• Her top tips for presenting data in a slideshow.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Equality Machine
Many recent books and articles spread fear about data collection and A.I. Today's guest, Prof. Orly Lobel, offers the antidote with her book "The Equality Machine" — an optimistic take on the future of data science.
Liquid Neural Networks
Liquid Neural Networks are a new, biology-inspired deep learning approach that could be transformative. I think they're super cool and Adrian Kosowski, PhD introduced them to me for today's Five-Minute Friday episode.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Analytics Career Orientation
Considering a Data Analytics career? Today's episode with YouTube icon Luke Barousse (273k subscribers) will be particularly appealing to you, but the terrifically interesting guest makes for an episode that anyone will love.
Luke:
• Is a full-time YouTuber, creating highly educational — but nevertheless hilarious — videos focused on Data Analytics.
• Previously worked as a Lead Data Analyst and Data Engineer at BASF.
• Worked for seven years in the US Navy on nuclear-powered submarines.
• Holds a degree in mechanical engineering, a graduate qualification in nuclear engineering, and an MBA in business analytics.
In this episode, Luke details:
• The must-have skills for entry-level data analyst roles.
• The data analyst skills mistakenly and erroneously pursued by many folks considering the career.
• How his submariner experience prepared him well for a data career.
• His favorite tools for creating interactive data dashboards.
• His favorite scraping libraries for collecting data from the web.
• The skills to learn now to be prepared for the data careers of the future.
• The benefits of CrossFit beyond just the fitness improvements.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Resilient Machine Learning
Machine learning is often fragile in production. For today's Five-Minute Friday episode, Dr. Dan Shiebler details how we can make ML more resilient.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Software for Efficient Data Science
In today's episode, Dr. Jodie Burchell details a broad range of tools for working efficiently with data, including data cleaning, reproducibility, visualization, and natural language processing.
Jodie:
• Is the Data Science Developer Advocate for JetBrains, the developer-tools company behind PyCharm (one of the most widely-used Python IDEs) and DataLore (their new cloud platform for collaborative data science).
• Previously was Data Scientist or Lead Data Scientist at several tech companies, developing specializations in search, recommender systems, and NLP.
• Co-authored two books on data visualization libraries: "The Hitchhiker's Guide to ggplot2" and "The Hitchhiker's Guide to Plotnine".
• Prior to entering industry, was a postdoctoral fellow in biostatistics at the University of Melbourne.
• Holds a PhD in Psychology from the Australian National University.
Today’s episode is primarily intended for a technical audience as it's packed with practical tips and software for data scientists.
In this episode, Jodie details:
• What a data science developer advocate is and why you might want to consider it as a career option.
• How to work effectively, efficiently, and confidently with real-world data.
• Her favorite Python libraries, such as ones for data viz and NLP.
• How to have reproducible data science workflows.
• The subject she would have majored in if she could go back in time.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Critical Human Element of Successful A.I. Deployments
For today's episode, I sat down with the prolific data-science instructor, author and practitioner Keith McCormick to discuss how critical user considerations are for developing a successful A.I. application.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AutoML: Automated Machine Learning
AutoML with Erin LeDell — it rhymes! In today's episode, H2O.ai's Chief ML Scientist guides us through what Automated Machine Learning is and why it's an advantageous technique for data scientists to adopt.
Dr. LeDell:
• Has been working at H2O.ai — the cloud A.I. firm that has raised over $250m in venture capital and is renowned for its open-source AutoML library — for eight years.
• Founded (WiMLDS) Women in Machine Learning & Data Science (100+ chapters worldwide).
• Co-founded R-Ladies Global, a community for genders currently underrepresented amongst R users.
• Is celebrated for her talks at leading A.I. conferences.
• Previously was Principal Data Scientist at two acquired A.I. startups.
• Holds a Ph.D. from the Berkeley focused on ML and computational stats.
Today’s episode is relatively technical so will primarily appeal to technical listeners, but it would also provide context to anyone who’s interested to understand how key aspects of data science work are becoming increasingly automated.
In this episode, Erin details:
• What AutoML — automated machine learning — is and why it’s an advantageous technique for data scientists to adopt.
• How the open-source H2O AutoML platform works.
• What the “No Free Lunch Theorem” is.
• What Admissible Machine Learning is and how it can reduce the biases present in many data science models.
• The new software tools she’s most excited about.
• How data scientists can prepare for the increasingly automated data science field of the future.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Subword Tokenization with Byte-Pair Encoding
When working with written natural language data as we do with many natural language processing models, a step we typically carry out while preprocessing the data is tokenization. In a nutshell, tokenization is the conversion of a long string of characters into smaller units that we call tokens.
Read MoreAnalyzing Blockchain Data and Cryptocurrencies
As real-time, publicly-available ledgers of transactions, blockchains provide exciting new data analytics opportunities. Kimberly Grauer leads us through the tools and approaches for blockchain analytics.
Kim:
• Is Director of Research at Chainalysis Inc., the world’s leading crypto analytics firm.
• Previously worked in an economic research and analysis group for NYC.
• Holds a Masters in Political Theory from the University of Oxford, a Master of Public Administration from the London School of Economics, and she completed the General Assembly Data Science bootcamp.
Today’s episode will appeal primarily to folks who are interested in blockchains and cryptocurrencies, particularly those keen to perform data analysis on blockchain data.
In this episode, Kim details:
• The unique real-time economic-data analytics opportunities that blockchains provide.
• Examples of her own research on blockchain data, such as analyses of illegal activity and global crypto adoption.
• The tools and approaches she uses daily to analyze and report on blockchain data.
• Where the evolutions of crypto, blockchains, and data science are going together.
• Why a data science bootcamp could be exactly the right thing for you if you’re looking to break into the field.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Imagen Video: Incredible Text-to-Video Generation
For today’s Five-Minute Friday episode, it’s my pleasure to introduce you to the Imagen Video model published upon just a few weeks ago by researchers from Google.
Read MoreData Analyst, Data Scientist, and Data Engineer Career Paths
Keen to become a Data Analyst? Get promoted to Sr Data Analyst? Or explore Data Engineer/Scientist options? Shashank, a YouTube expert on these questions (>100k subscribers!) tackles them in today's episode.
Shashank:
• Has an exceptional YouTube channel focused on helping people break into a data analyst career.
• Works as a Senior Data Engineer at digital sports platform Fanatics, Inc.
• Was previously Data Analyst at luxury retailer Nordstrom and other firms.
• Holds a degree in chemistry from Emory University in Atlanta.
Today’s episode will appeal primarily to folks who are interested in becoming a data analyst, or who are interested in transitioning from a data analyst role into a data science or data engineering role.
In this episode, Shashank details:
• How you can land an entry-level data analyst role in just a few weeks, regardless of your educational and professional background.
• The hard and soft skills you need to progress from a junior data analyst to a senior data analyst position.
• What it takes to transition from data analyst to a typically more lucrative role as a data scientist or data engineer.
• His favorite resources for learning the essential skills for data scientists.
What he looks for when he’s interviewing candidates.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Burnout: Causes and Solutions
What really is Burnout? What causes it? And how can you prevent or treat it? Prof. Christina Maslach — world-leading researcher and author on Burnout — joins me for today's episode to unpack these questions.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.