Many-time bestselling author and prolific open-source R developer Hadley Wickham is our guest today. In it, we discuss Posit's rebrand and why the Tidyverse needs to be in every data scientist's toolkit.
More on Hadley:
• Chief Scientist at Posit PBC
• Adjunct Professor of Statistics at Stanford University, Rice University and The University of Auckland.
• Is best-known as the creator of the Tidyverse suite of open-source R libraries for data science, including the essential libraries dplyr and ggplot2.
• Has written seminal books on R programming for O'Reilly, Springer and CRC Press, including the mega-bestselling "R for Data Science".
Today’s episode will primarily be of interest to hands-on practitioners like data scientists and machine learning engineers. In it, Hadley details:
• Why the iconic open-source company RStudio rebranded to Posit.
• The philosophy of the tidyverse, amusing backstories on its most iconic packages and why the tidyverse is invaluable for all data scientists to be familiar with.
• The open-source projects he’s most excited about today.
• How you can easily get involved with career-bolstering open-source projects yourself.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Filtering by Category: Professional Development
In Case You Missed It in March 2024
We're trying something novel on the SuperDataScience Podcast today: an ICMYI ("in case you missed it") episode that highlights the most gripping moments from my conversations with guests over the past month.
Please let me know what you think of this! Does it work for you? What would you change about it? Should we stop doing these entirely? Let me know right here on this post; your voice matters :)
For this inaugural ICYMI episode, conversation highlights include:
1. Sebastian Raschka, PhD on how Lightning AI makes LLM training and deployment easy (from Episode #767).
2. Dr. Travis Oliphant, creator of the ubiquitous NumPy and SciPy libraries, on the future of scientific computing (#765).
3. Award-winning, A.I.-focused venture capitalist Rudina Seseri letting us know what it takes to get a VC firm to invest in you (#763).
4. Prof. Zachary Lipton on his roadmap from AI startup to long-term commercial success (#769).
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The Best A.I. Startup Opportunities, with venture capitalist Rudina Seseri
How should an A.I. startup find product-market fit? How do some A.I. startups become spectacularly successful? The renowned (and highly technical!) A.I. venture-capital investor Rudina Seseri answers these questions and more in today's episode.
Rudina:
• Founder and Managing Partner of Glasswing Ventures in Boston.
• Led investments and/or served on the Board of Directors of more than a dozen SaaS startups, many of which were acquired.
• Was named Startup Boston's 2022 "Investor of the Year" amongst many other formal recognitions.
• Is a sought-after keynote speaker on investing in A.I. startups.
• Executive Fellow at Harvard Business School.
• Holds an MBA from Harvard University.
Today’s episode will be interesting to anyone who’s keen on scaling their impact with A.I., particularly through A.I. startups or investment.
In this episode, Rudina details:
• How data are used to assess venture capital investments.
• What makes particular AI startups so spectacularly successful.
• Her "A.I. Palette" for examining categories of machine learning models and mapping them to categories of training data.
• How Generative AI isn’t a fad, but it is still only a component of the impact that AI more broadly can make.
• The automated systems she has built for staying up to date on all of the most impactful AI developments.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Brewing Beer with A.I., with Beau Warren
In today's episode, Beau Warren of the innovative "Species X" brewery, details how we collaborated together on an A.I. model to craft the perfect beer. Dubbed "Krohn&Borg" lager, you can join us in Columbus, Ohio on Thursday night to try it yourself! 🍻
Read MoreA Code-Specialized LLM Will Realize AGI, with Jason Warner
Don't miss this mind-blowing episode with Jason Warner, who compellingly argues that code-specialized LLMs will bring about AGI. His firm, poolside, was launched to achieve this and facilitate an "AI-led, developer-assisted" coding paradigm en route.
Jason:
• Is Co-Founder and CEO of poolside, a hot venture capital-backed startup that will shortly be launching its code-specialized Large Language Model and accompanying interface that is designed specifically for people who code like software developers and data scientists.
• Previously was Managing Director at the renowned Bay-Area VC Redpoint Ventures.
• Before that, held a series of senior software-leadership roles at major tech companies including being CTO of GitHub and overseeing the Product Engineering of Ubuntu.
• Holds a degree in computer science from Penn State University and a Master's in CS from Rensselaer Polytechnic Institute.
Today’s episode should be fascinating to anyone keen to stay abreast of the state of the art in A.I. today and what could happen in the coming years.
In today’s episode, Jason details:
• Why a code-generation-specialized LLM like poolside’s will be far more valuable to humans who code than generalized LLMs like GPT-4 or Gemini.
• Why he thinks AGI itself will be brought about by a code-specialized ML model like poolside’s.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI is Disadvantaging Job Applicants, But You Can Fight Back
In today's important episode, the author, professor and journalist Hilke Schellmann details how specific HR-tech firms misuse A.I. to facilitate biased hiring, promotion, and firing decisions. She also covers how you can fight back and how A.I. can be done right!
Hilke’s book, "The Algorithm: How A.I. Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now", was published earlier this month. In the exceptionally clear and well-written book, Hilke draws on exclusive information from whistleblowers, internal documents and real‑world tests to detail how many of the algorithms making high‑stakes decisions are biased, racist, and do more harm than good.
In addition to her book, Hilke:
• Is Assistant Professor of Journalism and A.I. at New York University.
• Previously worked in journalism roles at The Wall Street Journal, The New York Times and VICE Media.
• Holds a Master’s in investigative reporting from Columbia University.
Today’s episode will be accessible and interesting to anyone. In it, Hilke details:
• Examples of specific HR-technology firms that employ misleading Theranos-like tactics.
• How A.I. *can* be used ethically for hiring and throughout the employment lifecycle.
• What you can do to fight back if you suspect you’ve been disadvantaged by an automated process.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
2024 Data Science Trend Predictions
What are the big A.I. trends going to be in 2024? In today's episode, the magnificent data-science leader and futurist Sadie St. Lawrence fill us in by methodically making her way from the hardware layer (e.g., GPUs) up to the application layer (e.g., GenAI apps).
Read MoreHow to Integrate Generative A.I. Into Your Business, with Piotr Grudzień
Want to integrate Conversational A.I. ("chatbots") into your business and ensure it's a (profitable!) success? Then today's episode with Quickchat AI co-founder Piotr Grudzień, covering both customer-facing and internal use cases, will be perfect for you.
Piotr:
• Is Co-Founder and CTO of Quickchat AI, a Y Combinator-backed conversation-design platform that lets you quickly deploy and debug A.I. assistants for your business.
• Previously worked as an applied scientist at Microsoft.
• Holds a Master’s in computer engineering from the University of Cambridge.
Today's episode should be accessible to technical and non-technical folks alike.
In this episode, Piotr details:
• What it takes to make a conversational A.I. system successful, whether that A.I. system is externally facing (such as a customer-support agent) or internally facing (such as a subject-matter expert).
• What’s it’s been like working in the fast-developing Large Language Model space over the past several years.
• What his favorite Generative A.I. (foundation model) vendors are.
• What the future of LLMs and Generative A.I. will entail.
• What it takes to succeed as an A.I. entrepreneur.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Visualize Data Effectively, with Prof. Alberto Cairo
The renowned data-visualization professor and many-time bestselling author Dr. Alberto Cairo is today's guest! Want a copy of his fantastic new book, "The Art of Insight"? I'm giving away ten physical copies; see below for how to get one.
Alberto:
• Is the Knight Chair in Infographics and Data Visualization at the University of Miami.
• Leads visualization efforts at the University of Miami’s Institute for Data Science and Computing.
• Is a consultant for Google, the US government and many more prominent institutions.
• Has written three bestselling books on data visualization, all in the past decade.
• His fourth book, "The Art of Insight", was just published.
Today’s episode will be of interest to anyone who’d like to understand how to communicate with data more effectively.
In this episode, which tracks the themes covered in his "The Art of Insight" book, Alberto details:
• How data visualization relates to the very meaning of life.
• What it takes to enter in a meditation-like flow state when creating visualizations.
• When the “rules” of data communication should be broken.
• His data visualization tips and tricks.
• How infographics can drive social change.
• How extended reality, A.I. and other emerging technologies will change data viz in the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAssistant: The Open-Source ChatGPT Alternative, with Dr. Yannic Kilcher
Yannic Kilcher — famed Machine Learning YouTuber and creator of OpenAssistant, the best-known open-source conversational A.I. — is today's rockstar guest! Hear from this luminary where the biggest A.I. opportunities are in the coming years 😎
If you’re not already aware of him, Dr. Yannic:
• Has over 230,000 subscribers on his machine learning YouTube channel.
• Is the CTO of DeepJudge, a Swiss startup that is revolutionizing the legal profession with AI tools.
• Led the development of OpenAssistant, a leading open-source alternative to ChatGPT, that has over 37,000 stars (⭐️⭐️⭐️!!!) on GitHub.
• Holds a PhD in A.I. from the outstanding Swiss technical university, ETH Zürich.
Despite being such a technical expert himself, most of today’s episode should be accessible to anyone who’s interested in A.I., whether you’re a hands-on practitioner or not.
In this episode, Yannic details:
• The behind-the-scenes stories and lasting impact of his OpenAssistant project.
• The technical and commercial lessons he’s learned while growing his A.I. startup.
• How he stays up to date on ML research.
• The important, broad implications of adversarial examples in ML.
• Where the biggest opportunities are in A.I. in the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How GitHub Operationalizes AI for Teamwide Collaboration and Productivity, with GitHub COO Kyle Daigle
Today's episode features the exceptionally passionate GitHub COO Kyle Daigle detailing how generative A.I. tools improve not only the way individuals work, but also dramatically transform the way people across entire firms collaborate.
Kyle was my on-stage guest for a "fireside chat" live on stage at Insight Partners' ScaleUp:AI conference in New York. It was a terrifically slick conference and a ton of fun to collaborate on stage with Kyle! He's an energizing and inspiring speaker.
Check out the episode for all of our conversation; some of the key takeaways are:
• Generative AI tools like GitHub CoPilot are most useful and efficient when they’re part of your software-development flow.
• These kinds of in-flow generative AI tools can be used for collaboration (such as speeding up code review) not just on an individual basis.
• "Innersourcing" takes open-source principles but applies them within an organization on their proprietary assets.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Seven Factors for Successful Data Leadership
Today's episode is a fun one with the jovial EIGHT-time book author, Ben Jones. In it, Ben covers the seven factors of successful data leadership — factors he's gleaned from administering his data literacy assessment to 1000s of professionals.
Ben:
• Is the CEO of Data Literacy, a firm that specializes in training and coaching professionals on data-related topics like visualization and statistics.
• Has published eight books, including bestsellers "Communicating Data with Tableau" (O'Reilly, 2014) and "Avoiding Data Pitfalls" (Wiley, 2019).
• Has been teaching data visualization at the University of Washington for nine years.
• Previously worked for six years as a director at Tableau.
Today’s episode should be broadly accessible to any interested professional.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
ChatGPT Custom Instructions: A Major, Easy Hack for Data Scientists
Thanks to Shaan Khosla for tipping me off to a crazy easy hack to get markedly better results from GPT-4: providing Custom Instructions that prompt the algorithm to iterate upon its own output while critically evaluating and improving it.
Here's Shaan's full Custom Instructions text, which he himself has been iterating on in recent months:
"I need you to help me with a task. To help me with the task, first come up with a detailed outline of how you think you should respond, then critique the ideas in this outline (mention the advantages, disadvantages, and ways it could be improved), then use the original outline and the critiques you made to come up with your best possible solution.
"Overall, your tone should not be overly dramatic. It should be clear, professional, and direct. Don't sound robotic or like you're trying to sell something. You don't need to remind me you're a large language model, get straight to what you need to say to be as helpful as possible. Again, make sure your tone is clear, professional, and direct - not overly like you're trying to sell something."
Try it out! If you haven't used Custom Instructions before, in today's episode I talk you through how to set it up and explain why this approach is so effective. In the video version, I provide a screenshare that makes getting started foolproof.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Llama 2, Toolformer and BLOOM: Open-Source LLMs with Meta’s Dr. Thomas Scialom
Thomas Scialom, PhD is behind many of the most popular Generative A.I. projects including Llama 2, the world's top open-source LLM. Today, the Meta A.I. researcher reveals the stories behind Llama 2 and what's in the works for Llama 3.
Thomas:
• Is an A.I. Research Scientist at Meta.
• Is behind some of the world’s best-known Generative A.I. projects including Llama 2, BLOOM, Toolformer and Galactica.
• Is contributing to the development of Artificial General Intelligence (AGI).
• Has lectured at many of the top A.I. labs (e.g., Google, Stanford, MILA).
• Holds a PhD from Sorbonne University, where he specialized in Natural-Language Generation with Reinforcement Learning.
Today’s episode should be equally appealing to hands-on machine learning practitioners as well as folks who may not be hands on but are nevertheless keen to understand the state-of-the-art in A.I. from someone who’s right on the cutting edge of it all.
In this episode, Thomas details:
• Llama 2, today’s top open-source LLM, including what is what like behind the scenes developing it and what we can expect from the eventual Llama 3 and related open-source projects.
• The Toolformer LLM that learns how to use external tools.
• The Galactica science-specific LLM, why it was brought down after a few days, and how it might eventually re-emerge in a new form.
• How RLHF — reinforcement learning from human feedback — shifts the distribution of generative A.I. outputs from approximating the average of human responses to excellent, often superhuman quality.
• How soon he thinks AGI — artificial general intelligence — will be realized and how.
• How to make the most of the Generative A.I. boom as an entrepreneur.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Generative A.I. without the Privacy Risks (with Prof. Raluca Ada Popa)
Consumers and enterprises dread that Generative A.I. tools like ChatGPT breach privacy by using convos as training data, storing PII and potentially surfacing confidential data as responses. Prof. Raluca Ada Popa has all the solutions.
Today's guest, Raluca:
• Is Associate Professor of Computer Science at University of California, Berkeley.
• Specializes in computer security and applied cryptography.
• Her papers have been cited over 10,000 times.
• Is Co-Founder and President of Opaque Systems, a confidential computing platform that has raised over $31m in venture capital to enable collaborative analytics and A.I., including allowing you to securely interact with Generative A.I.
• Previously co-founded PreVeil, a now-well-established company that provides end-to-end document and message encryption to over 500 clients.
• Holds a PhD in Computer Science from MIT.
Despite Raluca being such a deep expert, she does such a stellar job of communicating complex concepts simply that today’s episode should appeal to anyone that wants to dig into the thorny issues around data privacy and security associated with Large Language Models (LLMs) and how to resolve them.
In the episode, Raluca details:
• What confidential computing is and how to do it without sacrificing performance.
• How you can perform inference with an LLM (or even train an LLM!) without anyone — including the LLM developer! — being able to access your data.
• How you can use commercial generative models OpenAI’s GPT-4 without OpenAI being able to see sensitive or personally-identifiable information you include in your API query.
• The pros and cons of open-source versus closed-source A.I. development.
• How and why you might want to seamlessly run your compute pipelines across multiple cloud providers.
• Why you should consider a career that blends academia and entrepreneurship.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How Firms Can Actually Adopt A.I., with Rehgan Avon
Rehgan Avon's DataConnect conference is this week and is getting rave reviews. In this SuperDataScience episode, Jon Krohn, the silver-tongued entrepreneur details how organizations can successfully adopt A.I.
Read MoreBrain-Computer Interfaces and Neural Decoding, with Prof. Bob Knight
In today's extraordinary episode, Prof. Bob Knight details how ML-powered brain computer interfaces (BCIs) could allow real-time thought-to-speech synthesis and the reversal of cognitive decline associated with aging.
This is a rare treat as "Dr. Bob" doesn't use social media and has only made two previous podcast appearances: on Ira Flatow's "Science Friday" and a little-known program called "The Joe Rogan Experience".
Dr. Bob:
• Is Professor of Neuroscience and Psychology at University of California, Berkeley.
• Is Adjunct Professor of Neurology and Neurosurgery at UC San Francisco.
• Over his career, has amassed tens of millions of dollars in research funding, 75 patents, and countless international awards for neuroscience and cognitive computing research.
• His hundreds of papers have together been cited over 70,000 times.
In this episode, Bob details:
• Why the “prefrontal cortex” region of our brains makes us uniquely intelligent relative to all the other species on this planet.
• The invaluable data that can be gathered by putting recording electrodes through our skulls and directly into our brains.
• How "dynamic time-warping" algorithms allow him to decode imagined sounds, even musical melodies, through recording electrodes implanted into the brain.
• How BCIs are life-changing for a broad range of illnesses today.
• The extraordinary ways that advances in hardware and machine learning could revolutionize medical care with BCIs in the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tools for Building Real-Time Machine Learning Applications, with Richmond Alake
Today, the astonishingly industrious ML Architect and entrepreneur Richmond Alake crisply describes how to rapidly develop robust and scalable Real-Time Machine Learning applications.
Richmond:
• Is a Machine Learning Architect at Slalom Build, a huge Seattle-based consultancy that builds products embedded with analytics and ML.
• Is Co-Founder of two startups: one uses computer vision to correct peoples’ form in the gym and the other is a generative A.I. startup that works with human speech.
• Creates/delivers courses for O'Reilly and writes for NVIDIA.
• Previously worked as a Computer Vision Engineer and as a Software Developer.
• Holds a Masters in Computer Vision, ML and Robotics from the University of Surrey.
Today’s episode will appeal most to technical practitioners, particularly those who incorporate ML into real-time applications, but there’s a lot in this episode for anyone who’d like to hear about the latest tools for developing real-time ML applications from a leader in the field.
In this episode, Richmond details:
• The software choices he’s made up and down the application stack — from databases to ML to the front-end — across his startups and the consulting work he does.
• The most valuable real-time ML tools he teaches in his courses.
• Why writing for the public is an invaluable career hack that everyone should be taking advantage of.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
Today, the wildly intelligent Dr. Matar Haller introduces Contextual A.I. (which considers adjacent, often multimodal information when making inferences) as well as how to use ML to build moat around your company.
Matar:
• Is VP of Data and A.I. at ActiveFence, an Israeli firm that has raised over $100m in venture capital to protect online platforms and their users from malicious behavior and malicious content.
• Is renowned for her top-rated presentations at leading conferences.
• Previously worked as Director of Algorithmic A.I. at SparkBeyond, an analytics platform.
• Holds a PhD in neuroscience from the University of California, Berkeley.
• Prior to data science, taught soldiers how to operate tanks.
Today’s episode has some technical moments that will resonate particularly well with hands-on data science practitioners but for the most part the episode will be interesting to anyone who wants to hear from a brilliant person on cutting-edge A.I. applications.
In this episode, Matar details:
• The “database of evil” that ActiveFence has amassed for identifying malicious content.
• Contextual A.I. that considers adjacent (and potentially multimodal) information when classifying data.
• How to continuously adapt A.I. systems to real-world adversarial actors.
• The machine learning model-deployment stack she uses.
• The data she collected directly from human brains and how this research relates to the brain-computer interfaces of the future.
• Why being a preschool teacher is a more intense job than the military.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Business Intelligence Tools, with Mico Yuk
Today's guest is the straight shooter Mico Yuk, who pulls absolutely no punches in her assessment of, well, anything! ...but particularly about vendors in the business intelligence and data analytics space. Enjoy!
Mico:
• Is host of the popular Analytics on Fire Podcast (top 2% worldwide).
• Co-founded the BI Brainz Group, an analytics consulting and solutions company that has taught over 15,000 students analytics, visualization and data storytelling courses — included at major multinationals like Nestlé, FedEx and Procter & Gamble.
• Authored the "Data Visualization for Dummies" book.
• Is a sought-after keynote speaker and TV-news commentator.
In this episode, Mico details:
• Her BI (business intelligence) and analytics framework that persuades executives with data storytelling.
• What the top BI tools are on the market today.
• The BI trends she’s observed that could predict the most popular BI tools of the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.