Groundbreaking multi-agent systems (MAS, for short) are transforming the way AI models collaborate to tackle complex challenges.
Read MoreFiltering by Category: Data Science
MLOps: The Job and The Key Tools, with Demetrios Brinkmann
Today, global MLOps community leader Demetrios Brinkmann details why MLOps is essential, how it differs from related roles like LLMOps, DevOps and A.I. Engineering, and the best tools for deploying and scaling LLMs.
Demetrios:
• Is Founder and CEO of MLOps Community, an organization dedicated to supporting MLOps professionals that has quickly grown to over 20,000 members.
• Was previously founder of the Data on Kubernetes community.
• Before that, worked in public-facing roles at a number of European tech startups.
Today’s episode will be of interest to anyone who’s keen to better understand the critical function of MLOps in bringing machine learning models to the real world.
In today’s episode, Demetrios details:
• What exactly MLOps is and how it relates to other jobs like LLMOps, DevOps and A.I. Engineer.
• The key MLOps tools and approaches.
• What it takes to build a thriving community of tens of thousands of professionals in just a few years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Six Keys to Data Scientists’ Success, with Kirill Eremenko
For today's episode, Kirill Eremenko — who has taught more than 2.8 million people data science — fills us in on his six most valuable insights about data science careers.
More on Kirill:
• Founder and CEO of SuperDataScience, an e-learning platform that is the namesake of this very podcast.
• Launched the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins four years ago.
• Has reached more than 2.8 million students through the courses he’s published on Udemy, making him Udemy’s most popular data science instructor.
At a high level, Kirill's six data science insights are:
1. Unlike many other careers, there’s no need for formal credentials to become a data scientist.
2. Mentors can be invaluable guides in a DS career, but you should also try to give back to your mentors when you can.
3. Portfolios are the key to landing the DS job of your dream because they showcase your DS abilities for all to see.
4. Hands-on labs are a fun, interactive way to develop your portfolio and are a great complement to classes.
5. Collaborations can make lots of aspects of DS career development fun, including learning new materials, completing labs and developing your portfolio.
6. Data scientists can come from any background and work from anywhere in the world with an Internet connection.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Math, Quantum ML and Language Embeddings, with Dr. Luis Serrano
Today, Dr. Luis Serrano (a master at making complex math and ML topics friendly) leads a mind-expanding discussion on embeddings in LLMs, Quantum ML and what the next big trends in A.I. will be. I wouldn't miss this one 🤯
Luis:
• Is the beloved creator behind the Serrano Academy, an educational YouTube channel on math and ML with over 146,000 subscribers.
• Until this month, he worked as Head of Developer Relations at Cohere, one of the world’s few A.I. labs that is actually at the frontier of LLMs.
• Prior to that, he was a Quantum A.I. Research Scientist at Zapata Computing, Lead A.I. Educator at Apple, Head of Content for A.I. at Udacity and ML Engineer at Google.
• Holds a PhD in Math from the University of Michigan.
Today’s episode should be appealing to just about anyone! In it, Luis details:
• How supposedly complex topics like math and A.I. can be made easy to understand.
• How Cohere’s focus on enterprise use cases for LLMs has led it to specialize in embeddings, the most important component of LLMs.
• The promising application areas for Quantum Machine Learning.
• What the next big trends in A.I. will be.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Aligning Large Language Models, with Sinan Ozdemir
For today’s quick Five-Minute Friday episode, the exceptional author, speaker and entrepreneur Sinan Ozdemir provides an overview of what it actually means for an LLM to be “aligned”.
More on Sinan:
• Is Founder and CTO of LoopGenius, a generative AI startup.
• Has authored several excellent books, including, most recently, the bestselling "Quick Start Guide to Large Language Models".
• Is a serial AI entrepreneur, including founding a Y Combinator-backed generative AI startup way back in 2015 that was later acquired.
This episode was filmed live at the Open Data Science Conference (ODSC) East in Boston last month. Thanks to ODSC for providing recording space.
The Super Data Science Podcast is available on all major podcasting platforms and a video version is on YouTube. This is episode #784!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Generative A.I. for Solar Power Installation, with Navdeep Martin
A startling 70% of solar-power projects fail. In today's episode, hear how Navdeep Martin's startup Flypower is using Generative A.I. to ensure we install renewable energy sources more effectively and efficiently.
Navdeep:
• Co-founder and CEO of Flypower, a generative A.I. startup dedicated to ensuring clean-energy projects, particularly solar-power projects, succeed.
• Previously held senior product leadership roles at VC-backed Bay Area AI startups as well as for AI products at Comcast and The Washington Post.
• Before that, was a software engineer for the CIA.
• Holds a degree in computer science from William & Mary and an MBA from the University of Virginia.
Today’s episode will appeal to anyone who’d like to hear about the evolution of generative A.I. technologies in products and applications, including how you can best make use of the various categories of Gen-A.I. technologies today and how, in particular, A.I. is being used to overcome the social and regulatory hurdles associated with combating climate change.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in April 2024
Other than excessive maleness and paleness*, April 2024 was an excellent month for the podcast, packed with outstanding guests. ICYMI, today's episode highlights the most fascinating moments of my convos with them.
Specifically, conversation highlights include:
1. Iconic open-source developer Dr. Hadley Wickham putting the "R vs Python" argument to bed.
2. Aleksa Gordić, creator of a digital A.I.-learning community of 160k+ people, on the movement from formal to self-directed education.
3. World-leading futurist Bernard Marr on how we can work with A.I. as opposed to it lording over of us.
4. Educator of millions of data scientists, Kirill Eremenko, on why gradient boosting is so powerful for making informed business decisions.
5. Prof. Barrett Thomas on how drones could transform same-day delivery.
*Remedied in May!
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Ensuring Successful Enterprise AI Deployments, with Sol Rashidi
Prodigious Sol Rashidi has deployed nearly 40 large-scale data and A.I. projects at Fortune 100 companies. Her rich insights on doing this successfully fill her new book and are distilled into today's fun episode.
Sol ☀:
• Has been a C-suite data/analytics/A.I. leader at Estée Lauder, Merck pharmaceuticals, Sony Music and Royal Caribbean Cruise Lines.
• Was Senior Partner leading the Digital and Innovation Practice at EY and was the Partner who led the Watson go-to-market at IBM.
• Has been involved in over three dozen large-scale data/A.I. deployments.
• Is recognized with a string of international awards for her leadership.
• Holds eight patents with many more pending.
Today’s episode will be invaluable to anyone who’d like to succeed at deploying A.I. models commercially. In it, Sol details:
• Her straightforward system for selecting the enterprise A.I. projects that will be successfully deployed.
• What kinds of A.I. projects should always be avoided.
• Why larger enterprises drag their feet on impactful A.I. projects and how to overcome such corporate logjams.
• When you should patent an innovation.
• Why Chief Data Officers and related C-suite roles have such high turnover.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Become a Data Scientist, with Dr. Adam Ross Nelson
Today's episode features Dr. Adam Ross Nelson providing his #1 most useful piece of guidance on "How to Become a Data Scientist" from his book of that very name!
This was filmed live at the Open Data Science Conference (ODSC) East in Boston last week — thanks ODSC East for providing valuable conference space for us to shoot podcast episodes.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Tidyverse of Essential R Libraries and their Python Analogues, with Dr. Hadley Wickham
Many-time bestselling author and prolific open-source R developer Hadley Wickham is our guest today. In it, we discuss Posit's rebrand and why the Tidyverse needs to be in every data scientist's toolkit.
More on Hadley:
• Chief Scientist at Posit PBC
• Adjunct Professor of Statistics at Stanford University, Rice University and The University of Auckland.
• Is best-known as the creator of the Tidyverse suite of open-source R libraries for data science, including the essential libraries dplyr and ggplot2.
• Has written seminal books on R programming for O'Reilly, Springer and CRC Press, including the mega-bestselling "R for Data Science".
Today’s episode will primarily be of interest to hands-on practitioners like data scientists and machine learning engineers. In it, Hadley details:
• Why the iconic open-source company RStudio rebranded to Posit.
• The philosophy of the tidyverse, amusing backstories on its most iconic packages and why the tidyverse is invaluable for all data scientists to be familiar with.
• The open-source projects he’s most excited about today.
• How you can easily get involved with career-bolstering open-source projects yourself.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
What will humans do when machines are vastly more intelligent? With Aleksa Gordić
Aleksa Gordić — the famed A.I. educator and multilingual-LLM entrepreneur — is my guest today. Brilliant and widely-read, Aleksa opines on what it will take to realize Artificial Super Intelligence and the consequences for humans.
Aleksa:
• Is Founder & CEO of Runa AI, a startup focused on building multilingual LLMs.
• Is an online educator that has built a community of 160,000 people in the A.I. space, including through his A.I. Epiphany YouTube channel.
• Previously, he was an A.I. Research Engineer at Google DeepMind in London and a Machine Learning Software Engineer at Microsoft.
• He holds a degree in Electronics and Computer Science from the University of Belgrade in Serbia.
Today’s episode contains tidbits here and there that will appeal primarily to hands-on machine learning practitioners, but it mostly should be of great interest to anyone.
In this episode, wildly-intelligent Aleksa details:
• Why multilingual LLMs provide so much value despite the cutting-edge LLMs like Claude 3, Gemini Ultra and GPT-4 supporting so many languages.
• His frameworks for entrepreneurial success and for effective self-directed learning.
• His analogy for how humans are born as a checkpoint of a Bayesian model that’s fine-tuned with reinforcement learning from human feedback (RLHF).
• What he thinks it will take to realize artificial super intelligence and what it could mean for human society when it arrives.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
RFM-1 Gives Robots Human-like Reasoning and Conversation Abilities
Today’s episode is all about an LLM trained for robotics applications called RFM-1 that completely blows my mind because of the implications for what can now suddenly be accomplished so easily with robotics.
Read MoreIn Case You Missed It in March 2024
We're trying something novel on the SuperDataScience Podcast today: an ICMYI ("in case you missed it") episode that highlights the most gripping moments from my conversations with guests over the past month.
Please let me know what you think of this! Does it work for you? What would you change about it? Should we stop doing these entirely? Let me know right here on this post; your voice matters :)
For this inaugural ICYMI episode, conversation highlights include:
1. Sebastian Raschka, PhD on how Lightning AI makes LLM training and deployment easy (from Episode #767).
2. Dr. Travis Oliphant, creator of the ubiquitous NumPy and SciPy libraries, on the future of scientific computing (#765).
3. Award-winning, A.I.-focused venture capitalist Rudina Seseri letting us know what it takes to get a VC firm to invest in you (#763).
4. Prof. Zachary Lipton on his roadmap from AI startup to long-term commercial success (#769).
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Gradient Boosting: XGBoost, LightGBM and CatBoost, with Kirill Eremenko
You wanted more of Kirill Eremenko, now you've got it! Kirill returns to the show today to detail Decision Trees, Random Forests and all three of the leading gradient-boosting algorithms: XGBoost, LightGBM and CatBoost 😸
If you don’t already know him, Kirill:
• Is Founder and CEO of SuperDataScience, an e-learning platform that is the namesake of this very podcast.
• Launched the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins four years ago.
• Has reached more than 2.7 million students through the courses he’s published on Udemy, making him Udemy’s most popular data science instructor.
Today’s episode is a highly technical one focused specifically on Gradient Boosting methods and the foundational theory required to understand them. I expect this episode will be of interest primarily to hands-on practitioners like data scientists, software developers and machine learning engineers.
In this episode, Kirill details:
• Decisions Trees.
• How Decision Trees are ensembled into Random Forests via Bootstrap Aggregation.
• How the AdaBoost algorithm formed a bridge from Random Forests to Gradient Boosting.
• How Gradient Boosting works for both regression and classification tasks.
• All three of the most popular Gradient Boosting approaches — XGBoost, LightGBM and CatBoost — as well as when you should choose them.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Generative AI for Medicine, with Prof. Zack Lipton
Generative A.I. is rapidly transforming medicine. My guest today is brilliant, inspiring Prof. Zachary Lipton — Chief Scientific Officer and CTO of Abridge, a startup that has quickly raised $208m to lead the transformation!
More on Zack:
• Assoc. Prof. in the Machine Learning Dept. of Carnegie Mellon University's Computer Science school.
• Highly-cited (23k+ citations) with research spanning core ML methods and theory, as well as applications in healthcare and NLP.
• Directs the Approximately Correct Machine Intelligence (ACMI) Lab at CMU, where they build robust systems for the real world.
• Is also a jazz saxophonist! 🎷
Despite Zack being such a deep technical expert, most of today’s content will be of interest to anyone who’d like to hear about the cutting edge of generative A.I. applications in healthcare.
The tech that Zack is leading development of at Abridge, which you can hear about in today's episode:
• Initial deployment uses ambient listening and generative A.I. to reduce the cognitive burden of clinical documentation, reducing burnout as well as enabling clinicians to spend less time with computers and more with patients.
• Industry-leading automatic speech recognition engine specifically designed for healthcare applications; can accurately transcribe speech in challenging environments, e.g., when there is background noise or when multiple people are speaking.
• Supports 14+ languages including handling code-switching (where speakers shift between languages) and interpreter-mediated conversations.
• In-house LLM development allows greater customization and responsible-use features, such as transparency (e.g., links to source transcript/audio) and evidence extraction (verification process).
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
NumPy, SciPy and the Economics of Open-Source, with Dr. Travis Oliphant
Huge episode today with iconic Dr. Travis Oliphant, creator of NumPy and SciPy, the standard libraries for numeric operations (downloaded 8 million and 3 million times PER DAY, respectively). Hear about the future of open-source, including the impact of GenAI.
More on Travis:
• Founded Anaconda, Inc., the company behind the also-ubiquitous Python package manager.
• Founded the massive PyData conferences and communities as well as its associated non-profit foundation, NumFOCUS.
• Currently serves as the CEO of two firms: OpenTeams and Quansight.
• Holds a PhD in biomedical engineering from the Mayo Clinic in Minnesota.
Today’s episode will primarily be of interest to hands-on practitioners like data scientists, software developers and machine learning engineers.
In it, Travis details:
• How his journey creating open-source software began and how NumPy and SciPy grew to become the most popular foundational Python libraries for working with data.
• How he identifies commercial opportunities to support his vast open-source efforts and communities.
• How AI, particularly generative AI, is transforming open-source development.
• Where open-source innovation is headed in the years to come.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Gemini Ultra: How to Release an A.I. Product for Billions of Users, with Google’s Lisa Cohen
Google recently released Gemini Ultra, their largest language model. I love Ultra and now use it instead of GPT-4 on many tasks. Today's guest, Lisa Cohen, leads Gemini's rollout; hear from her how a company with billions of users rolls out new A.I. products.
More on Gemini Ultra:
• The only LLM with comparable capabilities to GPT-4 (in my experience as well as on benchmark evaluations, although I know benchmarking has plenty of issues!)
• Ultra maintains attention across large context windows (Gemini 1.5 Pro has a million-token context, btw!), competently generating natural language and code.
• Like GPT-4V, Ultra is multi-modal and so accepts both an image and text as input at the same time.
• Piggybacking on Google's excellence at search, I’ve found Gemini Ultra to be particularly effective at tasks that involve real-time search (the Google "Bard" project that focused on real-time information retrieval was renamed "Gemini" when Gemini Ultra was released).
Lisa Cohen is perhaps the best person on the planet to be speaking to about the momentous Gemini releases because Lisa is Director of Data Science & Engineering for Google's Gemini, Assistant and Search Platforms. In addition, she:
• Was previously Senior Director of Data Science at Twitter and Principal Director of Data Science at Microsoft.
• Holds a Master's in Applied Math from Harvard University.
In this episode, Lisa details:
• The three LLMs in Google’s Gemini family and how the largest one, Gemini Ultra, fits in.
• The many ways you can access Gemini models today.
• How absolutely enormous LLM projects are carried out and how they’re rolled out safely and confidently to literally billions of users.
• How LLMs like Gemini Ultra are transforming life and work for everyone from data scientists to educators to children, and how this transformation will continue in the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko
Last month, Kirill Eremenko was on the show to detail Decoder-Only Transformers (like the GPT series). It was our most popular episode ever, so he's come right back today to detail an even more sophisticated architecture: Encoder-Decoder Transformers.
If you don’t already know him, Kirill:
• Is Founder and CEO of SuperDataScience, an e-learning platform that is the namesake of this podcast.
• Founded the Super Data Science Podcast in 2016 and hosted the show until he passed me the reins a little over three years ago.
• Has reached more than 2.7 million students through the courses he’s published on Udemy, making him Udemy’s most popular data science instructor.
Kirill was most recently on the show for Episode #747 to provide a technical introduction to the Transformer module that underpins all the major modern Large Language Models (LLMs) like the GPT, Gemini, Llama and BERT architectures. We received an unprecedented amount of positive feedback from that episode, demanding more! So here we are.
That episode, #747, as well as today’s, are perhaps the two most technical episodes of this podcast ever so they probably appeal mostly to hands-on practitioners like data scientists and ML engineers, particularly those who already have some understanding of deep neural networks.
In this episode, Kirill:
• Reviews the key Transformer theory that we covered in Episode #747, namely the individual neural-network components of the Decoder-Only architecture that prevails in generative LLMs like the GPT series models.
• Builds on that to detail the full, Encoder-Decoder Transformer architecture that is used in the original Transformer by Google, in their “Attention is All You Need” paper, as well as in other models that excel at both natural-language understanding and generation such as T5 and BART.
• Discusses the performance and capability pros and cons of full Encoder-Decoder architectures relative to Decoder-Only architectures like GPT and Encoder-Only architectures like BERT.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Mamba Architecture: Superior to Transformers in LLMs
Modern, cutting-edge A.I. basically depends entirely on the Transformer. But now, the first serious contender to the Transformer has emerged and it’s called Mamba; we’ve got the full paper—called "Mamba: Linear-TimeSequence Modeling with Selective State Spaces" and written by researchers at Carnegie Mellon and Princeton.
Read MoreHow to Speak so You Blow Listeners’ Minds, with Cole Nussbaumer Knaflic
Cole Nussbaumer Knaflic's book, "storytelling with data", has sold over 500k copies... wild! In today's episode, Cole details the best tricks from her latest book, "storytelling with you" — a goldmine on how to inform and profoundly engage people.
Cole:
• Is the author of “storytelling with data”, which has sold half a million copies, been translated into over 20 languages and is used by more than 100 universities. Nearly a decade old, the book is the #1 bestseller still today in several Amazon categories.
• Also wrote the follow-on, hands-on “storytelling with data: let’s practice!” a bestseller in its own right.
• Serves as the Founder and CEO of the storytelling with data company, which provides data-storytelling workshops and other resources.
• Previously she was a People Analytics Manager at Google.
• Holds a degree in math as well as an MBA from the University of Washington.
Today’s episode will be of interest to anyone who’d like to communicate so effectively and compellingly that people are blown away.
In this episode, Cole details:
• Her top tips for planning, creating and delivering an incredible presentation.
• A few special tips for communicating data effectively for all of you data nerds like me.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.