A team of researchers from Sakana AI, a Japanese AI startup founded last year by Google alumni and that reportedly was valued at over a $1 billion in June, this week published a paper titled "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery" that is making big waves and could revolutionize how we conduct scientific research.
Read MoreFiltering by Tag: ml
Deep Utopia: AI Could Solve All Human Problems in Our Lifetime
Today’s episode focuses on Nick Bostrom's latest book, Deep Utopia. Published a couple of weeks ago, it delves into the possibilities of a future where artificial intelligence has solved humanity's deepest problems.
Read MoreGenerative Deep Learning, with David Foster
Today, bestselling author David Foster provides a fascinating technical introduction to cutting-edge Generative A.I. concepts including variational autoencoders, diffusion models, contrastive learning, GANs and (my favorite!) "world models".
David:
• Wrote the O'Reilly book “Generative Deep Learning”; the first edition from 2019 was a bestseller while the second edition was released just last week.
• Is a Founding Partner of Applied Data Science Partners, a London-based consultancy specialized in end-to-end data science solutions.
• Holds a Master’s in Mathematics from the University of Cambridge and a Master’s in Management Science and Operational Research from the University of Warwick.
Today’s episode is deep in the weeds on generative deep learning pretty much from beginning to end and so will appeal most to technical practitioners like data scientists and ML engineers.
In the episode, David details:
• How generative modeling is different from the discriminatory modeling that dominated machine learning until just the past few months.
• The range of application areas of generative A.I.
• How autoencoders work and why variational autoencoders are particularly effective for generating content.
• What diffusion models are and how latent diffusion in particular results in photorealistic images and video.
• What contrastive learning is.
• Why “world models” might be the most transformative concept in A.I. today.
• What transformers are, how variants of them power different classes of generative models such as BERT architectures and GPT architectures, and how blending generative adversarial networks with transformers supercharges multi-modal models.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
A.I. for Medicine
Machine learning is ushering in a new era of medicine, e.g., by predicting the shape of therapeutic drugs and assisting in their design. Witty Prof. Charlotte Deane of the University of Oxford and Exscientia explains how.
Charlotte:
• Is a global-leading expert on using ML for designing therapeutic drugs.
• Has been faculty at the University of Oxford for over 20 years, where serves as Professor of Structural Bioinformatics and heads the 25-person Protein Informatics Lab.
• Is Chief Scientist Biologics A.I. at Exscientia, a NASDAQ-listed pharmatech company that uses computational approaches to drive drug development in a fraction of the time of traditional drug companies.
• Was COVID-Response Director for UK Research and Innovation, resulting in Queen Elizabeth II honoring her as a Member of the Most Excellent Order of the British Empire.
Today’s episode should appeal to technical and non-technical folks alike as it features an absolutely brilliant scientist and communicator describing how we can use A.I. to speed the discovery of new molecules that help our body fight off ailments as diverse as viruses and cancer.
In this episode, Prof. Deane details:
• How your immune system works.
• What biologics are and why they’re such an important class of drugs.
• What’s holding back the widespread use of precision medicines that are pinpoint-customized to a specific tumor in a specific person.
• What the celebrated AlphaFold algorithm does exquisitely and where it (and all other computational models of protein folding) still need to improve.
• How she used data to marshall the UK’s scientific response to Covid.
• How data and machine learning will transform drug development over the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Science Trends for 2023
Happy New Year! To kick it off, the entrepreneur, futurist, and mega-popular Machine Learning instructor Sadie St. Lawrence joins me to predict the biggest data science trends of 2023 🍾
We start the episode off by looking back at how our predictions for 2022 panned out from a year ago and then we dive into our predictions for the year ahead. Specific trends we discuss include:
• Data as a product
• Multimodal models
• Decentralization of enterprise data
• A.I. policy
• Environmental sustainability
This episode will appeal to technical and non-technical folks alike — anyone who’d like to understand the trends that will shape the field of data science and the broader world not only in 2023 but also in the years beyond.
Sadie:
• Has created data science and ML courses enjoyed by 350k+ students.
• Is Founder and CEO of Women In Data, a community of over 20k women across 17 countries.
• Serves on multiple start-up boards.
• Hosts the Data Bytes podcast.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Simplifying Machine Learning
Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.
Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.
Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.
In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Equality Machine
Many recent books and articles spread fear about data collection and A.I. Today's guest, Prof. Orly Lobel, offers the antidote with her book "The Equality Machine" — an optimistic take on the future of data science.
Software for Efficient Data Science
In today's episode, Dr. Jodie Burchell details a broad range of tools for working efficiently with data, including data cleaning, reproducibility, visualization, and natural language processing.
Jodie:
• Is the Data Science Developer Advocate for JetBrains, the developer-tools company behind PyCharm (one of the most widely-used Python IDEs) and DataLore (their new cloud platform for collaborative data science).
• Previously was Data Scientist or Lead Data Scientist at several tech companies, developing specializations in search, recommender systems, and NLP.
• Co-authored two books on data visualization libraries: "The Hitchhiker's Guide to ggplot2" and "The Hitchhiker's Guide to Plotnine".
• Prior to entering industry, was a postdoctoral fellow in biostatistics at the University of Melbourne.
• Holds a PhD in Psychology from the Australian National University.
Today’s episode is primarily intended for a technical audience as it's packed with practical tips and software for data scientists.
In this episode, Jodie details:
• What a data science developer advocate is and why you might want to consider it as a career option.
• How to work effectively, efficiently, and confidently with real-world data.
• Her favorite Python libraries, such as ones for data viz and NLP.
• How to have reproducible data science workflows.
• The subject she would have majored in if she could go back in time.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Critical Human Element of Successful A.I. Deployments
For today's episode, I sat down with the prolific data-science instructor, author and practitioner Keith McCormick to discuss how critical user considerations are for developing a successful A.I. application.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AutoML: Automated Machine Learning
AutoML with Erin LeDell — it rhymes! In today's episode, H2O.ai's Chief ML Scientist guides us through what Automated Machine Learning is and why it's an advantageous technique for data scientists to adopt.
Dr. LeDell:
• Has been working at H2O.ai — the cloud A.I. firm that has raised over $250m in venture capital and is renowned for its open-source AutoML library — for eight years.
• Founded (WiMLDS) Women in Machine Learning & Data Science (100+ chapters worldwide).
• Co-founded R-Ladies Global, a community for genders currently underrepresented amongst R users.
• Is celebrated for her talks at leading A.I. conferences.
• Previously was Principal Data Scientist at two acquired A.I. startups.
• Holds a Ph.D. from the Berkeley focused on ML and computational stats.
Today’s episode is relatively technical so will primarily appeal to technical listeners, but it would also provide context to anyone who’s interested to understand how key aspects of data science work are becoming increasingly automated.
In this episode, Erin details:
• What AutoML — automated machine learning — is and why it’s an advantageous technique for data scientists to adopt.
• How the open-source H2O AutoML platform works.
• What the “No Free Lunch Theorem” is.
• What Admissible Machine Learning is and how it can reduce the biases present in many data science models.
• The new software tools she’s most excited about.
• How data scientists can prepare for the increasingly automated data science field of the future.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tools for Deploying Data Models into Production
Today's guest is mighty Erik Bernhardsson — creator of Spotify's music recommender, prolific open-source developer, world-leading technical blogger, and now model-deployment-tool entrepreneur via Modal Labs.
Erik:
• Is the Founder and CEO of Modal Labs, a startup building innovative tools and infrastructure for data teams.
• Previously was CTO of the real estate startup Better, where he grew the engineering team from the size of 1 — himself — to 300 people.
• Was also previously an Engineering Manager at Spotify, where he created their now-ubiquitous music-recommendation algorithm.
• Is a prolific open-sourcer, having created the popular Luigi and Annoy libraries, among several others.
• Is an industry-leading blogger with posts that frequently feature on the front page of Hacker News.
Today’s episode gets deep into the weeds at points, so it will be particularly appealing to practicing data scientists, ML engineers, and the like, but much of the fascinating, wide-ranging conversation in this episode will appeal to any curious listener.
In this episode, Erik details:
• How the Spotify music recommender he built works so well at scale.
•The litany of new data science and engineering tools he’s excited about and thinks you should be excited about too.
•What open-source library he would develop next.
•Why he founded his Modal and how their tools empower data teams.
• Having interviewed more than 2000 candidates for engineering roles, his top tips both for succeeding as an interviewer and as an interviewee.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Causal Machine Learning
Causal ML is today's focus with Dr. Emre Kiciman — Senior Principal Researcher at Microsoft, developer of the DoWhy causal modeling library for Python, and a leader in applying causal research to social sciences.
Emre:
• Has worked within prestigious Microsoft Research for over 17 years.
• Leads Microsoft’s research on Causal Machine Learning.
• Leads development of the DoWhy open-source causal modeling library for Python (part of the PyWhy GitHub project).
• Pioneered the use of social media data to answer causal questions in the social sciences, such as with respect to physical and mental health.
• Has published 100+ papers and been cited 8000+ times.
• Holds a PhD in Computer Science from Stanford University.
Today’s episode is relatively technical, so will probably appeal primarily to folks with technical backgrounds like data scientists, ML engineers, and software developers.
In this episode, Emre details:
• What Causal ML is and how it’s different from "correlational" ML.
• The four key steps of causal inference and how they impact ML.
• The types of data that are most amenable to causal methods and those that aren’t yet… but may be soon.
• Exciting real-world applications of Causal ML.
• The software tools he most highly recommends.
• What he looks for in the data science researchers he hires.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The A.I. Platforms of the Future
Ben Taylor returns for a third consecutive Five-Minute Friday! This week, he helps us look ahead and dig into what we can expect from the A.I. platforms of the future.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Why CEOs Care About A.I. More than Other Technologies
Ben Taylor is back for another Five-Minute Friday this week, this time to fill us in on why CEOs care more about A.I. than any other technology and how to sell them on your machine learning solution.
Special shout-out to my puppy Oboe who features indispensably in the video version of this episode... on Ben's lap! 🐶
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Sell a Multimillion Dollar A.I. Contract
Starting today and running for four consecutive weeks, Five-Minute Friday episodes of SuperDataScience feature Ben Taylor as my guest. Each week, he answers a specific ML commercialization or education question.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Model Speed vs Model Accuracy
In the vast majority of real-world, commercial cases, the speed of a machine learning algorithm is more important than it's accuracy. Hear why in today's Five-Minute Friday episode!
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Transforming Dentistry with A.I.
Engineer and computer scientist Dr. Wardah Inam has raised $79m in venture capital to transform dentistry with machine learning. Hear about it, as well as her tips for scaling an A.I. company, in this week's episode.
Wardah:
• Is Co-Founder/CEO of Overjet, which is transforming dentistry with ML.
• Co-founded uLink Technologies, a start-up behind A.I.-driven power grids.
• Served as Lead Product Manager at Q Bio, a healthcare A.I. start-up.
• Was a Postdoc in MIT’s renowned CSAIL (Computer Science and A.I. Lab).
• Holds an MIT PhD in electrical engineering and computer science.
Today’s episode focuses more on practical applications of ML and growing an A.I. company than getting into the nitty-gritty of ML models themselves, so it should be broadly appealing to both technically-oriented and business-oriented folks.
In the episode, Wardah details:
• How Overjet not only classifies images but quantifies dental diagnoses with computer vision, enabling models to answer questions like “how large is this cavity?”
• How natural language processing can be essential for determining the correct dental diagnosis.
• The data-labeling challenges firms like Overjet need to overcome to enable ML models to learn from noisy, real-world data.
• Her tips for building a successful A.I. business.
• What she looks for in the data scientists and software engineers she hires.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Automating ML Model Deployment
Relative to training a machine learning model, getting it into production typically takes multiple times as much time and effort. Dr Doris Xin, the brilliant co-founder/CEO of Linea, has a near-magical, two-line solution.
In the episode, Doris details:
• How Linea reduces ML model deployment to two lines of Python code.
• The surprising extent of wasted computation she discovered when she analyzed over 3000 production pipelines at Google.
• Her experimental evidence that the total automation of ML model development is neither realistic nor desirable.
• What it’s like being the CEO of an exciting, early-stage tech start-up.
• Where she sees the field of data science going in the coming years and how you can prepare for it.
Today’s episode is more on the technical side so will likely appeal primarily to practicing data scientists, especially those that need to — or are interested in — deploying ML models into production.
Doris:
• Is co-founder and CEO of Linea, an early start-up that dramatically simplifies the deployment of machine learning models into production.
• Her alpha users include the likes of Twitter, Lyft, and Pinterest.
• Her start-up’s mission was inspired by research she conducted as a PhD student in computer science at the University of California, Berkeley.
• Previously she worked in research and software engineering roles at Google, Microsoft, Databricks, and LinkedIn.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Exercises on Event Probabilities
In recent weeks, my YouTube videos have covered Probability concepts like Events, Sample Spaces, and Combinatorics. Today's video features exercises to test and cement your understanding of those concepts.
We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.
More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.
Collaborative, No-Code Machine Learning
Emerging tools allow real-time, highly visual collaboration on data science projects — even in ways that allow those who code and those who don't to work together. Tim Kraska fills us in on how ML models enable this.
Tim:
• Is Associate Professor in the revered CSAIL lab at the Massachusetts Institute of Technology.
• Co-founded Einblick, a visual data computing platform that has received $6m in seed funding.
• Was previous a professor at Brown University, a visiting researcher at Google, and a postdoctoral researcher at Berkeley.
• Holds a PhD in computer science from ETH Zürich in Switzerland.
Today’s episode gets into technical aspects here and there, but will largely appeal to anyone who’s interested in hearing about the visual, collaborative future of machine learning.
In this episode, Tim details:
• How a tool like Einblick can simultaneously support folks who code as well as folks who’d like to leverage data and ML without code.
• How this dual no-code/Python code environment supports visual, real-time, click-and-point collaboration on data science projects.
• The clever database and ML tricks under the hood of Einblick that enable the tool to run effectively in real time.
• How to make data models more widely available in organizations.
• How university environments like MIT’s CSAIL support long-term innovations that can be spun out to make game-changing impacts.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.