Today’s episode topic is on Google’s newly-released (and frankly sensational) product NotebookLM. All you need is a Google login, which is as easy as having a Gmail account. Use of NotebookLM is likewise totally free.
Read MoreThe Skills You Need to Be an Effective Data Scientist, with Marck Vaisman
Based on extensive research and analytical evaluations, in today's episode Marck Vaisman details all the skills that are essential for today's data professional.
Marck:
• Has been at Microsoft for seven years; for 5+ years, he’s been a Senior Cloud Solutions Architect, specializing in data, data science and AI/ML.
• For nearly a decade he’s also been an adjunct professor at both Georgetown University and The George Washington University, teaching graduate-level courses on math, stats, analytics and decision sciences.
• Co-Founded a non-profit in Washington, DC that runs both the Data Science DC and Statistical Programming DC Meetups.
• Holds a Bachelor's in Mechanical Engineering from Boston University and an MBA from Vanderbilt University.
Today’s episode will be of interest to anyone who is, manages, or aspires to be a data professional.
In today’s episode, Marck details:
• The skills, competencies and personas that data scientists and related professionals (such as analysts, data engineers, ML engineers and A.I. engineers) can have.
• The academic research on why “data scientist” is such a difficult job title to define.
• A comprehensive characterization of the essential skills that every data professional needs to be effective and the skills that allow you to specialize as a particular subtype of data scientist.
• The implications of all of this for both folks hunting for a data role and the companies that are looking to hire them.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI's o1 "Strawberry" Models
Today’s episode, which, given the gravity of the event, could of course be none other than OpenAI’s new o1 series of models, which represent a tremendous leap forward in AI capabilities.
Read MorePyTorch: From Zero to Hero, with Luka Anicin
Today's episode is on Python's most popular auto-differentiation library, PyTorch, and how you can use it to design, train and deploy deep neural nets, including LLMs. Acclaimed PyTorch instructor Luka Anicin is our guide.
Luka:
Is one of Udemy’s all-time bestselling instructors on A.I.; over 500,000 students have taken his courses.
His latest course, available exclusively at SuperDataScience.com, is called “PyTorch: From Zero to Hero”.
CEO of full-lifecycle A.I. consultancy Datablooz.
Holds a Bachelor’s in Computer Science, a Master’s in Data Science and is nearing completion of his PhD in Applied A.I.
Today’s episode will probably appeal most to hands-on practitioners like data scientists, software developers and ML engineers.
In it, Luka details:
What the popular Python library PyTorch is for.
Why you would select PyTorch over TensorFlow or Scikit-learn.
The tensor building blocks PyTorch provides for designing, training and deploying state-of-the-art deep neural networks, including Large Language Models (LLMs).
His top tips for accurate and efficient deep learning.
Guidance on PyTorch portfolio projects.
Real-world PyTorch case-studies from his experience leading an A.I. consultancy.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in August 2024
We had a slew of eye-opening conversations in August on the SuperDataScience Podcast I host. ICYMI, today's episode highlights the most fascinating moments from my convos with them.
Specifically, conversation highlights include:
1. ChainML's Head of A.I. Education Shingai Manjengwa on how multiple, individual A.I. agents can come together to perform complex actions.
2. Renowned futurist and entrepreneur Dr. Daniel Hulme on how A.I. can help us become better and faster at our jobs by circumventing the traditional corporate hierarchies that today seem only to slow us down.
3. Mathematical-optimization guru Jerome Yurchisin (of Gurobi Optimization) on how continuing education will be vital in our increasingly automated work environment... and how this education will be streamlined by A.I.
4. Nick Elprin, Co-Founder and CEO of the wildly successful Domino Data Lab, on why it's essential for enterprises to clearly define their A.I. infrastructure in order for their A.I. deployments to prosper.
Check out today's episode (#818) to hear all these eye-opening conversations. The "Super Data Science Podcast with Jon Krohn" is available on all major podcasting platforms and a video version is on YouTube.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Positron IDE, Tidy NLP and MLOps with Dr. Julia Silge
Prepare to have your brain tickled by Dr. Julia Silge. In today's episode, Julia details the IDE she's been developing for data scientists, "Tidy" NLP, and open-source libraries that make MLOps a breeze.
More on Julia:
• Engineering Manager at Posit PBC (makers of RStudio... and the company formerly known as RStudio).
• Authored the bestselling O'Reilly books “Text Mining with R” and “Tidy Modeling with R".
• Previously worked as a Data Scientist at Stack Overflow and Datassist.
• Prior to joining industry, was an academic researcher and professor at Yale University.
• Holds a PhD in Astronomy from The University of Texas at Austin.
Today’s episode will probably appeal most to hands-on practitioners like data scientists, software developers and ML engineers. In it, Julia details:
• The brand-new IDE Positron (free to use and source-available) that she’s been developing.
• Her favorite LLMs for code generation.
• The open-source software libraries that make MLOps easy.
• Her top tips for effective Natural Language Processing, including when more traditional NLP techniques should be used instead of an LLM.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Explaining AGI to a 94-Year-Old
In today's short episode, I explain "data", "data science", "A.I." and AGI to a 94-year-old woman (my brilliant grandmother) who previously had no familiarity with the terms.
Perhaps the episode will be helpful to folks who are unfamiliar with any of these terms themselves, or to folks who'd like ideas for how to explain any of them to laypeople.
("AGI" is Artificial General Intelligence, btw!)
The "Super Data Science Podcast with Jon Krohn" is available on your favorite podcasting platform and the video version (which today is simply an audio waveform!) is on YouTube. Today's episode is #816.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
DataFrame Operations 100x Faster than Pandas, with Marco Gorelli
Today's episode is all about Polars — the hot library for Python that offers up to 100x speedups for DataFrame operations relative to pandas. Marco Gorelli, a core Polars developer, is our gifted guide.
Marco is a tremendously talented communicator of complex technical topics, making him the perfect guest for this highly technical episode. He:
• Is a core developer of the popular Python libraries pandas and Polars.
• Is the creator of the Narwhals library.
• Has spoken at several major Python conferences (such as PyData), taught Polars professionally, and wrote the first complete Polars plugins tutorial.
• Currently works as Senior Software Engineer at Quansight Labs.
• Previously, worked as a data scientist and was one of the prize winners (from amongst >100,000 entrants!) of the M6 forecasting competition.
• Holds a Master’s in Mathematics and the Foundations of Computer Science from the University of Oxford.
Today’s episode will appeal primarily to hands-on technical folks like data scientists, ML engineers and software developers.
In today’s episode, Marco details:
• What the hot, fast-growing Polars library for working with DataFrames in Python is (it already has 65m downloads and 28k GitHub stars).
• How Polars offers up to 100x speed-ups relative to Pandas on DataFrame operations.
• How the lightweight, dependency-free Narwhals package he created allows for easy compatibility between different DataFrame libraries such as Polars and Pandas.
• How he got addicted to open-source development.
• The simple trick he used to be a prize-winner in super-popular forecasting competitions.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Summer Reflections
This week, I’m enjoying the tail end of the northern-hemisphere summer by spending time with my family.
Read MoreSolving Business Problems Optimally with Data, with Jerry Yurchisin
For many real-world commercial problems, the best approach is not machine learning or statistics; it's Mathematical Optimization. In today's episode, hear all about optimization from the guru Jerome Yurchisin.
Jerry's an extraordinarily clear communicator of complex topics and a world-leading expert on real-world applications of mathematical optimization. He:
• Works as a Data Science Strategist at Gurobi Optimization, a leading decision-intelligence company that provides mathematical optimization solutions to the likes of Uber, Air France and the NFL (indeed, a wild 8 out of 10 Fortune 10 companies use Gurobi!)
• Previously spent eight years as a mathematical consultant where he paired mathematical optimization with machine learning, statistics and simulation to inform decision-making.
• He was also previously an instructor at the University of North Carolina at Chapel Hill, where he obtained his Master’s in Operations Research and Statistics.
• He holds an additional Master’s in Applied Math from Ohio University.
Today’s episode may appeal most to hands-on practitioners like data scientists and ML engineers, but it does also have tons of content that will be of interest to anyone who’d like to leverage data to make better commercial decisions or optimize commercial processes.
In this episode, Jerry details:
• What mathematical optimization is.
• The kinds of real-world problems where mathematical optimization is a far better approach than a machine learning or statistics approach.
• The history of mathematical optimization including why it wasn’t popular until recently.
• The cutting-edge hardware and software innovations in mathematical optimization today.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The AI Scientist: Towards Fully Automated, Open-Ended Scientific Discovery
A team of researchers from Sakana AI, a Japanese AI startup founded last year by Google alumni and that reportedly was valued at over a $1 billion in June, this week published a paper titled "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery" that is making big waves and could revolutionize how we conduct scientific research.
Read MoreScaling Data Science Teams Effectively, with Nick Elprin
Today's episode with (extremely intelligent and wildly successful ML entrepreneur) Nick Elprin covers efficiently scaling data science teams and ensuring A.I. projects are commercial wins 🥇
Nick:
• Is Co-Founder and CEO of Domino Data Lab, a colossal Bay Area startup that has raised over $200m in venture capital from some of the world’s most prestigious VC firms.
• Prior to co-founding Domino Data Lab 11 years ago, he worked as a technologist at Bridgewater Associates, the well-known hedge fund.
• He holds both a BA and MS in Computer Science from Harvard University.
Today’s episode may appeal most to technical folks but has tons of content that will be of interest to anyone in or interested in commercializing data science or A.I.
In this episode, Nick details:
• How organizations can leverage enterprise platforms to efficiently scale their data science teams and data science workflows.
• The exact team size at which integrating such a platform becomes worthwhile.
• How to ensure A.I. projects are commercially successful.
• The tech stack they use at Domino to create such a performant platform.
• His top tip for growing your own colossal data science startup.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Five Levels of Self-Driving Cars
Back in Episode #748 earlier this year, I covered the five levels of Artificial General Intelligence. Well, today, inspired by my first-ever experience in an autonomous vehicle (a Waymo ride while in San Francisco recently), we’ve got an episode on the five levels of motor-vehicle automation.
Read MoreAgentic AI, with Shingai Manjengwa
Today's episode is all about Agentic A.I. — perhaps the hottest topic in A.I. today. Astoundingly intelligent and articulate Shingai Manjengwa couldn't be a better guide for us on this hot topic 🔥
Shingai:
Head of A.I. Education at ChainML, a prestigious startup focused on developing tools for a future powered by A.I. agents.
Founder and former CEO of Fireside Analytics Inc. (developed online data-science courses that have been undertaken by 500,000 unique students).
Previously was Director of Technical Education at the prominent global A.I. research center, the Vector Institute in Toronto.
Holds an MSc in Business Analytics from New York University.
Today’s episode should be equally appealing to hands-on practitioners like data scientists as to folks who generally yearn to stay abreast of the most cutting-edge A.I. techniques.
In today’s episode, Shingai details:
What A.I. agents are.
Why agents are the most exciting, fastest-growing A.I. application today.
How LLMs relate to agentic A.I.
Why multi-agent systems are particularly powerful.
How blockchain technology enables humans to better understand and trust A.I. agents.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
In Case You Missed It in July 2024
In July, we had a yet another bevy of extraordinary guests on the SuperDataScience Podcast I host. ICYMI, this episode highlights the most fascinating moments from my convos with them.
Specifically, conversation highlights include:
1. Iconic Daliana Liu (ex-AWS Senior Data Scientist; host of The Data Scientist Show) on the hard skills data scientists need most in today's market.
2. Pulitzer prize-winning journalist and many-time NY Times bestselling author Charles Duhigg on the secrets to being a "Supercommunicator", i.e., getting people invested in your ideas and opening up.
3. Arcee.ai's CEO Mark McQuade and Chief of Frontier Research Charles Goddard detail the frontier "model merging" technique whereby the capabilities of multiple LLMs can be combined without increasing model size.
4. Prolific Google DeepMind researcher Dr. Rosanne Liu (no relation to Daliana!) on her landmark “Beyond the Imitation Game" paper, particularly why all LLM benchmarks are flawed.
5. Andrey Kurenkov, PhD (A.I. Engineering Lead at Astrocade and founder/co-host of my favorite podcast, "Last Week in A.I.") on how Artificial Superintelligence (ASI) may be just a few years and what the implications could be for us as individuals as well as as a society.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Superintelligence and the Six Singularities, with Dr. Daniel Hulme
Artificial Superintelligence (ASI) could be realized in our lifetime... some even think within a few years. Today's brilliant guest, Dr. Daniel Hulme, details the six major ways society could be overhauled by ASI.
More on Daniel:
• Chief A.I. Officer at marketing giant WPP.
• CEO of A.I. consulting-services company Satalia.
• Entrepreneur-in-Residence at one of the world's top A.I.-research universities, UCL.
• Co-founder of Faculty and speaker at Singularity University.
• Holds an Eng.D. in computational complexity from UCL.
Today’s episode should be of interest to everyone. In it, Daniel details:
• How and when Artificial Super Intelligence (ASI) may arise.
• The six types of Singularity ASI is expected to unleash.
• Neuromorphic computing.
• How to align A.I. interests with human interests.
• Ways human work could be dramatically automated not just in the future… but this very day.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Llama 3.1 405B: The First Open-Source Frontier LLM
Meta releasing its giant (405-billion parameter) Llama 3.1 model is a game-changer: For the first time, an "open-source" LLM competes at the frontier (against proprietary models GPT-4o and Claude).
Read MoreHow to Be a Supercommunicator, with Charles Duhigg
Today, Pulitzer Prize winner and NY Times bestselling author Charles Duhigg reveals how you can become a "Supercommunicator", allowing you to connect with anyone, form deep bonds and get more done with others.
More on Charles:
• Pulitzer prize-winning journalist who currently writes for The New Yorker.
• His first book, "The Power of Habit", was published in 2012, spent over three years on New York Times bestseller lists and was translated into 40 languages.
• His second book, "Smarter Faster Better", was published in 2016 and was also a New York Times bestseller.
• Is a graduate of Yale University and Harvard Business School.
Today’s episode should be of great interest to everyone. In it, Charles provides the key takeaways from "Supercommunicators" including:
• Step-by-step instructions on how to connect meaningfully with anyone.
• The three types of conversation and how to ascertain which one you’re in at any given moment.
• How to have productive conflicts without the conversation spiraling out of control.
• How generative A.I. is transforming our conversations today and how the technology may transform them even more dramatically in the future.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI x Solar Power = Abundant Energy
Seventy years ago, the iconic AT&T Bell Labs unveiled cells that could transform sunlight into power. What started as a potential replacement for batteries in remote locations has now become a global phenomenon. Today, solar panels cover an area the size of Jamaica and provide approximately 6% of the world's electricity.
Read MoreHow to Thrive in Your (Data Science) Career, with Daliana Liu
In today's episode, renowned Daliana Liu details how to overcome common (unhelpful!) career mindsets and thrive professionally, including finding your niche and getting promoted... all without burning out!
If you haven’t already heard of her, Daliana:
• Is well-known for her content creation on data science careers, particularly career-growth strategies, leading her to have >280,000 LinkedIn followers.
• Her The Data Scientist Show is in the top 2% of all podcasts globally in terms of downloads.
• Specializes in 1:1 career coaching as well as coaching groups through structured programs like her upcoming "Survive and Thrive in Data Science and AI Careers" course.
• Previously worked as a Senior Data Scientist at AWS and Predibase (a Bay Area open-source LLMs startup).
• Holds a Master's in Statistics from UC Irvine.
Today’s episode is well-suited to *anyone* who’d like to thrive more than ever professionally; it will particularly appeal to data scientists and related professionals like data analysts, ML engineers and software developers… but most of the advice Daliana covers is beneficial to anyone.
In today’s episode, Daliana details:
• Common unhelpful career mindsets and how to overcome them.
• How to find the role you really want as opposed to the one you think you want.
• How to find your niche in a fast-moving field.
• How to offset common professional issues like imposter syndrome, distraction and burnout.
• Her top tips for accelerating a technical career.
• The must-know tech skills for data scientists in today’s market.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.