Five-Minute Friday this week is a fun one! My top music/audio recommendations for you while you "deep work" 🎶
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
TEDxDrexelU on Deep Learning
I'm giving my first TED-format talk at TEDxDrexelU in Philadelphia on May 21. I'll provide a visual intro to Deep Learning and to the momentous opportunity we have to shape a bewilderingly prosperous world with A.I.
There are only 100 tickets available for sale (for $15!) but my understanding is that my talk will eventually be made available on the TED YouTube channel if you can't make it in-person.
The other compelling speakers are:
• Ebony White, PhD
• Adit Gupta
• Dale Moss
• Nadia Christina Jagessar, MBA
• Dr. Nyree Dardarian
• Raja Schaar, IDSA
Event and ticket details available here.
Automating ML Model Deployment
Relative to training a machine learning model, getting it into production typically takes multiple times as much time and effort. Dr Doris Xin, the brilliant co-founder/CEO of Linea, has a near-magical, two-line solution.
In the episode, Doris details:
• How Linea reduces ML model deployment to two lines of Python code.
• The surprising extent of wasted computation she discovered when she analyzed over 3000 production pipelines at Google.
• Her experimental evidence that the total automation of ML model development is neither realistic nor desirable.
• What it’s like being the CEO of an exciting, early-stage tech start-up.
• Where she sees the field of data science going in the coming years and how you can prepare for it.
Today’s episode is more on the technical side so will likely appeal primarily to practicing data scientists, especially those that need to — or are interested in — deploying ML models into production.
Doris:
• Is co-founder and CEO of Linea, an early start-up that dramatically simplifies the deployment of machine learning models into production.
• Her alpha users include the likes of Twitter, Lyft, and Pinterest.
• Her start-up’s mission was inspired by research she conducted as a PhD student in computer science at the University of California, Berkeley.
• Previously she worked in research and software engineering roles at Google, Microsoft, Databricks, and LinkedIn.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Daily Habit #9: Avoiding Messages Until a Set Time Each Day
This article was originally adapted from a podcast, which you can check out here.
At the beginning of the new year, in Episode #538, I introduced the practice of habit tracking and provided you with a template habit-tracking spreadsheet. Then, we had a series of Five-Minute Fridays that revolved around daily habits and we’ve been returning to this daily-habit theme periodically since.
The habits we covered in January and February were related to my morning routine. In March, we began coverage of habits on intellectual stimulation and productivity, such as reading and carrying out a daily math or computer science exercise.
Read MoreExercises on Event Probabilities
In recent weeks, my YouTube videos have covered Probability concepts like Events, Sample Spaces, and Combinatorics. Today's video features exercises to test and cement your understanding of those concepts.
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
Collaborative, No-Code Machine Learning
Emerging tools allow real-time, highly visual collaboration on data science projects — even in ways that allow those who code and those who don't to work together. Tim Kraska fills us in on how ML models enable this.
Tim:
• Is Associate Professor in the revered CSAIL lab at the Massachusetts Institute of Technology.
• Co-founded Einblick, a visual data computing platform that has received $6m in seed funding.
• Was previous a professor at Brown University, a visiting researcher at Google, and a postdoctoral researcher at Berkeley.
• Holds a PhD in computer science from ETH Zürich in Switzerland.
Today’s episode gets into technical aspects here and there, but will largely appeal to anyone who’s interested in hearing about the visual, collaborative future of machine learning.
In this episode, Tim details:
• How a tool like Einblick can simultaneously support folks who code as well as folks who’d like to leverage data and ML without code.
• How this dual no-code/Python code environment supports visual, real-time, click-and-point collaboration on data science projects.
• The clever database and ML tricks under the hood of Einblick that enable the tool to run effectively in real time.
• How to make data models more widely available in organizations.
• How university environments like MIT’s CSAIL support long-term innovations that can be spun out to make game-changing impacts.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
DALL-E 2: Stunning Photorealism from Any Text Prompt
OpenAI just released their "DALL-E 2" multimodal model that defines "state of the art" A.I.: Provide it with (even extremely bizarre) natural-language requests for an image and it generates it! Hear about it in today's episode, and check out this interactive post from OpenAI that demonstrates DALL-E 2's mind-boggling capabilities.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Combinatorics
Combinatorics is a field of math devoted to counting. In this week's YouTube video, we use examples with real numbers to bring Combinatorics to life and relate it to Probability Theory.
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
A.I. For Crushing Humans at Poker and Board Games
The first SuperDataScience episode filmed with a live audience! Award-winning researcher Dr. Noam Brown from Meta AI was the guest, filling us in on A.I. systems that beat the world's best at poker and other games.
We shot this episode on stage at MLconf in New York. This means that you’ll hear audience reactions in real-time and, near the end of the episode, many great questions from audience members once I opened the floor up to them.
This episode has some moments here and there that get deep into the weeds of machine learning theory, but for the most part today’s episode will appeal to anyone who’s interested in understanding the absolute cutting-edge of A.I. capabilities today.
In this episode, Noam details:
• What Meta AI (formerly Facebook AI Research) is, how it fits into Meta.
• His award-winning no-limit poker-playing algorithms.
• What game theory is and how he integrates it into his models.
• The algorithm he recently developed that can beat the world’s best players at “no-press” Diplomacy, a complex strategy board game.
• The real-world implications of his game-playing A.I. breakthroughs.
• Why he became a researcher at a big tech firm instead of academia.
Noam:
• Develops A.I. systems that can defeat the best humans at complex games that computers have hitherto been unable to succeed at.
• During his Ph.D. in computer science at Carnegie Mellon University, developed A.I. systems that defeated the top human players of no-limit poker — earning him a Science Magazine cover story.
• Also holds a master’s in robotics from Carnegie Mellon and a bachelor’s degree in math and computer science from Rutgers.
• Previously worked for DeepMind and the U.S. Federal Reserve Board.
Thanks to Alexander Holden Miller for introducing me to Noam and to Hannah Gräfin von Waldersee for introducing me to Alex!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
PaLM: Google's Breakthrough Natural Language Model
This month, Google announced a large natural language model called PaLM that provides staggering results on tasks like common-sense reasoning and solving Python-coding questions. Hear all about it in today's episode!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Multiple Independent Observations
In this week's YouTube tutorial, we consider probabilistic events where we have multiple independent observations — such as flipping a coin two or more times instead of just once.
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
Open-Access Publishing
This week Dr. Amy Brand, the pioneering Director of The MIT Press and executive producer of documentary films, leads discussion of the benefits of — and innovations in — open-access publishing.
In the episode, Amy details:
• What open-access means.
• Why open-access papers, books, data, and code are invaluable for data scientists and anyone else doing research and development.
• The new metadata standard she developed to resolve issues around accurate attribution of who did what for a given academic publication.
• How we can change the STEM fields to be welcoming to everyone, including historically underrepresented groups.
• What it’s like to devise and create an award-winning documentary film.
Amy:
• Leads one of the world’s most influential university presses as the Director and Publisher of the MIT Press.
• Created a new open-access business model called Direct to Open.
• Is Co-Founder of Knowledge Futures Group, a non-profit that provides technology to empower organizations to build the digital infrastructure required for open-access publishing.
• Launched MIT Press Kids, the first university+kids publishers collab.
• Was the executive producer of "Picture A Scientist", a documentary that was selected to premiere at the prestigious Tribeca Film Festival and was recognized with the 2021 Kavli Science Journalism Award.
• She holds a PhD in Cognitive Science from MIT.
Today’s episode is well-suited to a broad audience, not just data scientists.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Events and Sample Spaces
In this week's YouTube tutorial, I introduce the most fundamental atoms of probability theory: events and sample spaces. Enjoy 😀
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
AGI: The Apocalypse Machine
Jeremie Harris's work on A.I. could dramatically alter your perspective on the field of data science and the bewildering — perhaps downright frightening — impact you and A.I. could make together on the world.
Jeremie:
• Recently co-founded Mercurius, an A.I. safety company.
• Has briefed senior political and policy leaders around the world on long-term risks from A.I., including senior members of the U.K. Cabinet Office, the Canadian Cabinet, as well as the U.S. Departments of State, Homeland Security and Defense.
• Is Host of the excellent Towards Data Science podcast.
• He previously co-founded SharpestMinds, a Y Combinator-backed mentorship marketplace for data scientists.
• He proudly dropped out of his quantum mechanics PhD to found SharpestMinds.
• He hold a Master’s in biological physics from the University of Toronto.
In this episode, Jeremie details:
• What Artificial General Intelligence (AGI) is
• How the development of AGI could happen in our lifetime and could present an existential risk to humans, perhaps even to all life on the planet as we know it.
• How, alternatively, if engineered properly, AGI could herald a moment called the singularity that brings with it a level of prosperity that is not even imaginable today.
• What it takes to become an AI safety expert yourself in order to help align AGI with benevolent human goals
• His forthcoming book on quantum mechanics
• Why almost nobody should do a PhD
Today’s episode is deep and intense, but as usual it does still have a lot of laughs, and it should appeal broadly, no matter whether you’re a technical data science expert already or not.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Clem Delangue on Hugging Face and Transformers
In today's SuperDataScience episode, Hugging Face CEO Clem Delangue fills us in on how open-source transformer architectures are accelerating ML capabilities. Recorded for yesterday's ScaleUp:AI conference in NY.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
What Probability Theory Is
This week, we start digging into the actual, uh, theory of Probability Theory. I also highlight the field's relevance to Machine Learning and Statistics. Enjoy 😀
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
How to Rock at Data Science — with Tina Huang
Can you tell I had fun filming this episode with Tina Huang, YouTube data science superstar (293k subscribers)? In it, we laugh while discussing how to get started in data science and her learning/productivity tricks.
Tina:
• Creates YouTube videos with millions of views on data science careers, learning to code, SQL, productivity, and study techniques.
• Is a data scientist at one of the world's largest tech companies (she keeps the firm anonymous so she can publish more freely).
• Previously worked at Goldman Sachs and the Ontario Institute for Cancer Research.
• Holds a Masters in Computer and Information Technology from the University of Pennsylvania and a bachelors in Pharmacology from the University of Toronto
In this episode, Tina details:
• Her guidance for preparing for a career in data science from scratch.
• Her five steps for consistently doing anything.
• Her strategies for learning effectively and efficiently.
• What the day-to-day is like for a data scientist at one of the world’s largest tech companies.
• The software languages she uses regularly.
• Her SQL course.
• How her science and computer science backgrounds help her as a data scientist today.
Today’s episode should be appealing to a broad audience, whether you’re thinking of getting started in data science, are already an experienced data scientist, or you’re more generally keen to pick up career and productivity tips from a light-hearted conversation.
Thanks to Serg Masís, Brindha Ganesan and Ken Jee for providing questions for Tina... in Ken's case, a very silly question indeed.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Daily Habit #8: Math or Computer Science Exercise
This article was originally adapted from a podcast, which you can check out here.
At the beginning of the new year, in Episode #538, I introduced the practice of habit tracking and provided you with a template habit-tracking spreadsheet. Then, we had a series of Five-Minute Fridays that revolved around daily habits I espouse, and that theme continues today. The habits we covered in January and February were related to my morning routine.
Starting last week, we began coverage of habits on intellectual stimulation and productivity. Specifically, last week’s habit was “reading two pages”. This week, we’re moving onward with doing a daily technical exercise; in my case, this is either a mathematics, computer science, or programming exercise.
The reason why I have this daily-technical-exercise habit is that data science is both a limitlessly broad field as well as an ever-evolving field. If we keep learning on a regular basis, we can expand our capabilities and open doors to new professional opportunities. This is one of the driving ideas behind the #66daysofdata hashtag, which — if you haven’t heard of it before — is detailed in episode #555 with Ken Jee, who originated the now-ubiquitous hashtag.
Read MoreA Brief History of Probability Theory
This week's YouTube video is a quick introduction to the fascinating history of Probability Theory. Next week, we'll actually start digging into Probability Theory, uh, theory 😉
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
Engineering Data APIs
How you design a data API from scratch and how a data API can leverage machine learning to improve the quality of healthcare delivery are topics covered by Ribbon Health CTO Nate Fox in this week's episode.
Ribbon Health is a New York-based API platform for healthcare data that has raised $55m, including from some of the biggest names in venture capital like Andreessen Horowitz and General Catalyst.
Prior to Ribbon, Nate:
• Worked as an Analytics Engineer at the marketing start-up Unified.
• Was a Product Marketing Manager at Microsoft.
• Obtained a mechanical engineering degree from the Massachusetts Institute of Technology and an MBA from Harvard Business School.
In this episode, Nate details:
• What APIs ("application programming interfaces") are.
• How you design a data API from scratch.
• How Ribbon Health’s data API leverages machine learning models to improve the quality of healthcare delivery.
• How to ensure the uptime and reliability of APIs.
• How scientists and engineers can make a big social impact in health technology.
• His favorite tool for easily scaling up the impact of a data science model to any number of users.
• What he looks for in the data scientists he hires.
Today’s episode has some technical data science and software engineering elements here and there, but much of the conversation should be interesting to anyone who’s keen to understand how data science can play a big part in improving healthcare.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.