• Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
  • Menu

Jon Krohn

  • Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
Jon Krohn

Resilient Machine Learning

Added on November 25, 2022 by Jon Krohn.

Machine learning is often fragile in production. For today's Five-Minute Friday episode, Dr. Dan Shiebler details how we can make ML more resilient.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, ML Foundations, Interview, Five-Minute Friday, Data Science Tags machinelearning, MachineLearning, machine learning, Machine Learning, five-minute-friday, Five-Minute-Friday, supe, SuperDataScience, superdatascience, DataScience, datascience

Software for Efficient Data Science

Added on November 22, 2022 by Jon Krohn.

In today's episode, Dr. Jodie Burchell details a broad range of tools for working efficiently with data, including data cleaning, reproducibility, visualization, and natural language processing.

Jodie:
• Is the Data Science Developer Advocate for JetBrains, the developer-tools company behind PyCharm (one of the most widely-used Python IDEs) and DataLore (their new cloud platform for collaborative data science).
• Previously was Data Scientist or Lead Data Scientist at several tech companies, developing specializations in search, recommender systems, and NLP.
• Co-authored two books on data visualization libraries: "The Hitchhiker's Guide to ggplot2" and "The Hitchhiker's Guide to Plotnine".
• Prior to entering industry, was a postdoctoral fellow in biostatistics at the University of Melbourne.
• Holds a PhD in Psychology from the Australian National University.

Today’s episode is primarily intended for a technical audience as it's packed with practical tips and software for data scientists.

In this episode, Jodie details:
• What a data science developer advocate is and why you might want to consider it as a career option.
• How to work effectively, efficiently, and confidently with real-world data.
• Her favorite Python libraries, such as ones for data viz and NLP.
• How to have reproducible data science workflows.
• The subject she would have majored in if she could go back in time.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, Data Science Tags DataScience, datascientist, datascience, SuperDataScience, superdatascience, python, ml, ML, developertools, data science, Data Science

The Critical Human Element of Successful A.I. Deployments

Added on November 18, 2022 by Jon Krohn.

For today's episode, I sat down with the prolific data-science instructor, author and practitioner Keith McCormick to discuss how critical user considerations are for developing a successful A.I. application.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, Five-Minute Friday, Data Science, Interview Tags ai, AI, DataScience, datascience, SuperDataScience, superdatascience, ML, ml

AutoML: Automated Machine Learning

Added on November 15, 2022 by Jon Krohn.

AutoML with Erin LeDell — it rhymes! In today's episode, H2O.ai's Chief ML Scientist guides us through what Automated Machine Learning is and why it's an advantageous technique for data scientists to adopt.

Dr. LeDell:
• Has been working at H2O.ai — the cloud A.I. firm that has raised over $250m in venture capital and is renowned for its open-source AutoML library — for eight years.
• Founded (WiMLDS) Women in Machine Learning & Data Science (100+ chapters worldwide).
• Co-founded R-Ladies Global, a community for genders currently underrepresented amongst R users.
• Is celebrated for her talks at leading A.I. conferences.
• Previously was Principal Data Scientist at two acquired A.I. startups.
• Holds a Ph.D. from the Berkeley focused on ML and computational stats.

Today’s episode is relatively technical so will primarily appeal to technical listeners, but it would also provide context to anyone who’s interested to understand how key aspects of data science work are becoming increasingly automated.

In this episode, Erin details:
• What AutoML — automated machine learning — is and why it’s an advantageous technique for data scientists to adopt.
• How the open-source H2O AutoML platform works.
• What the “No Free Lunch Theorem” is.
• What Admissible Machine Learning is and how it can reduce the biases present in many data science models.
• The new software tools she’s most excited about.
• How data scientists can prepare for the increasingly automated data science field of the future.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, ML Foundations, Interview, Data Science, Computer Science Tags DataScience, datascientist, SuperDataScience, datascience, superdatascience, ml, ML, machinelearning, MachineLearning, machine learning, Machine Learning, AutoML

Subword Tokenization with Byte-Pair Encoding

Added on November 11, 2022 by Jon Krohn.

When working with written natural language data as we do with many natural language processing models, a step we typically carry out while preprocessing the data is tokenization. In a nutshell, tokenization is the conversion of a long string of characters into smaller units that we call tokens.

Read More
In YouTube, SuperDataScience, Five-Minute Friday, Data Science, Computer Science Tags tokenization, tokens, byte-pair encoding, data, DataScience, datascience, SuperDataScience, superdatascience, five-minute-friday, Five-Minute-Friday

Analyzing Blockchain Data and Cryptocurrencies

Added on November 8, 2022 by Jon Krohn.

As real-time, publicly-available ledgers of transactions, blockchains provide exciting new data analytics opportunities. Kimberly Grauer leads us through the tools and approaches for blockchain analytics.

Kim:
• Is Director of Research at Chainalysis Inc., the world’s leading crypto analytics firm.
• Previously worked in an economic research and analysis group for NYC.
• Holds a Masters in Political Theory from the University of Oxford, a Master of Public Administration from the London School of Economics, and she completed the General Assembly Data Science bootcamp.

Today’s episode will appeal primarily to folks who are interested in blockchains and cryptocurrencies, particularly those keen to perform data analysis on blockchain data.

In this episode, Kim details:
• The unique real-time economic-data analytics opportunities that blockchains provide.
• Examples of her own research on blockchain data, such as analyses of illegal activity and global crypto adoption.
• The tools and approaches she uses daily to analyze and report on blockchain data.
• Where the evolutions of crypto, blockchains, and data science are going together.
• Why a data science bootcamp could be exactly the right thing for you if you’re looking to break into the field.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, Data Science Tags crypto, blockchain, blockchains, cryptocurrencies, cryptocurrency, DataScience, datascience, SuperDataScience, superdatascience

Imagen Video: Incredible Text-to-Video Generation

Added on November 4, 2022 by Jon Krohn.

For today’s Five-Minute Friday episode, it’s my pleasure to introduce you to the Imagen Video model published upon just a few weeks ago by researchers from Google.

Read More
In YouTube, SuperDataScience, Podcast, Five-Minute Friday, Data Science Tags SuperDataScience, superdatascience, imagen, GoogleImagen, Google Imagen, Imagen video, DALL

Data Analyst, Data Scientist, and Data Engineer Career Paths

Added on November 1, 2022 by Jon Krohn.

Keen to become a Data Analyst? Get promoted to Sr Data Analyst? Or explore Data Engineer/Scientist options? Shashank, a YouTube expert on these questions (>100k subscribers!) tackles them in today's episode.

Shashank:
• Has an exceptional YouTube channel focused on helping people break into a data analyst career.
• Works as a Senior Data Engineer at digital sports platform Fanatics, Inc.
• Was previously Data Analyst at luxury retailer Nordstrom and other firms.
• Holds a degree in chemistry from Emory University in Atlanta.

Today’s episode will appeal primarily to folks who are interested in becoming a data analyst, or who are interested in transitioning from a data analyst role into a data science or data engineering role.

In this episode, Shashank details:
• How you can land an entry-level data analyst role in just a few weeks, regardless of your educational and professional background.
• The hard and soft skills you need to progress from a junior data analyst to a senior data analyst position.
• What it takes to transition from data analyst to a typically more lucrative role as a data scientist or data engineer.
• His favorite resources for learning the essential skills for data scientists.
What he looks for when he’s interviewing candidates.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Professional Development, Podcast, Interview, Data Science Tags DataScience, analytics, dataanalysis, dataanalytics, dataengineering, datascientist, datascience, SuperDataScience, superdatascience

Burnout: Causes and Solutions

Added on October 28, 2022 by Jon Krohn.

What really is Burnout? What causes it? And how can you prevent or treat it? Prof. Christina Maslach — world-leading researcher and author on Burnout — joins me for today's episode to unpack these questions.


The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Personal Improvement, Podcast, Five-Minute Friday, Data Science Tags Burnout, burnout prevention

Blockchains and Cryptocurrencies: Analytics and Data Applications

Added on October 28, 2022 by Jon Krohn.

Today's episode introduces what Blockchains are, what Crypto is, and Data Science applications of these technologies. Philip Gradwell of globally-renowned Chainalysis Inc. is our brilliant guide.

Philip:
• Is Chief Economist at Chainalysis, the world’s leading crypto analytics firm — their analysis is regularly featured by major news outlets.
• Previously worked as Principal at Vivid Economics, where he helped grow the consulting firm to 40 people, eventually culminating in its acquisition by consulting giant McKinsey & Company.
• Holds a Master’s in Economics from UCL and a PPE degree — that’s Philosophy, Politics, and Economics — from the University of Oxford.

Today’s episode will appeal to anyone looking for an introduction to the blockchain and cryptocurrencies. It’ll hold special appeal for people keen to do data science with these technologies.

In this episode, Philip details:
• Similarities and differences between analyzing cryptocurrencies and the established fiat currencies.
• His crypto data analytics pipeline.
• How he develops data products for a wide range of users, including businesses, banks, governments, and law enforcement.
• How the blockchain facilitates innovative computing and machine learning technologies.
• What he looks for in the data scientists he hires.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, Data Science Tags crypto, blockchain, blockchains, SuperDataScience, superdatascience, economics, DataScience, datascience, data science, Data Science, Data science

OpenAI Whisper: General-Purpose Speech Recognition

Added on October 21, 2022 by Jon Krohn.

One of the challenges holding machines back from approaching human-level speech recognition like Whisper has has been acquiring sufficiently large amounts of high-quality, labeled training data. “Labeled” in this case means audio of speech that has a corresponding text associated with it. With enough of these labeled data, a machine learning model can learn to take in speech audio as an input and then output the correct corresponding text.

Read More
In SuperDataScience, YouTube, Podcast, Personal Improvement, Five-Minute Friday, Data Science Tags emails, Substack, SuperDataScience, superdatascience, DataScience, datasets

Tools for Deploying Data Models into Production

Added on October 18, 2022 by Jon Krohn.

Today's guest is mighty Erik Bernhardsson — creator of Spotify's music recommender, prolific open-source developer, world-leading technical blogger, and now model-deployment-tool entrepreneur via Modal Labs.

Erik:
• Is the Founder and CEO of Modal Labs, a startup building innovative tools and infrastructure for data teams.
• Previously was CTO of the real estate startup Better, where he grew the engineering team from the size of 1 — himself — to 300 people.
• Was also previously an Engineering Manager at Spotify, where he created their now-ubiquitous music-recommendation algorithm.
• Is a prolific open-sourcer, having created the popular Luigi and Annoy libraries, among several others.
• Is an industry-leading blogger with posts that frequently feature on the front page of Hacker News.

Today’s episode gets deep into the weeds at points, so it will be particularly appealing to practicing data scientists, ML engineers, and the like, but much of the fascinating, wide-ranging conversation in this episode will appeal to any curious listener.

In this episode, Erik details:
• How the Spotify music recommender he built works so well at scale.
•The litany of new data science and engineering tools he’s excited about and thinks you should be excited about too.
•What open-source library he would develop next.
•Why he founded his Modal and how their tools empower data teams.
• Having interviewed more than 2000 candidates for engineering roles, his top tips both for succeeding as an interviewer and as an interviewee.


The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, ML Foundations, Interview, Data Science Tags MLOPs, ML, ml, opensource

The Joy of Atelic Activities

Added on October 14, 2022 by Jon Krohn.

You might think to yourself “I could be spending this time productively!” But pushing past these inner calls for productivity and leaning into the initial discomfort of atelic activities is likely to be rewarding. When you’re consumed by telic activities, by always pursuing outcomes, you’re missing out on being, on appreciating being alive for the fleeting moments that you have.

Read More
In YouTube, SuperDataScience, Personal Improvement, Podcast, Five-Minute Friday Tags wellness, atelic, activities, Self Improvement, SuperDataScience, superdatascience

Causality in Sequential Data

Added on October 11, 2022 by Jon Krohn.

Inferring Causality is uniquely powerful when done with Sequential Data: data unfolding over time. Forecasting guru Dr. Sean Taylor — renowned for Prophet and now Motif Analytics co-founder — leads us through the topic.

Sean:
• Is Co-Founder and Chief Scientist of Motif Analytics, a startup that blends his deep expertise in causal modeling with sequential analytics.
• Previously worked as a Data Science Manager at Lyft.
• Also worked as a Research Scientist Manager at Facebook, where he led the development of the renowned open-source forecasting tool, Prophet.
• Holds a PhD in Information Systems from New York University and a BS in Economics from the University of Pennsylvania.

Today’s episode gets deep into the weeds on occasion, particularly when discussing making causal inferences, but most of the episode will resonate with any curious listener.

In this episode, Sean:
• Publicly unveils his new venture, filling us in on why now was the right time for him to co-found and lead data science at an ML startup.
• Details what causal modeling is, why every data scientist should be familiar with it, and how it can make a real-world impact, with many illustrative examples from his time at Lyft.
• Fills us in on the infrastructure and teams required for large-scale causal experimentation.
• Covers how causal modeling and forecasting can’t be fully automated today as it requires humans to make assumptions, but also how humans can make these assumptions in a more informed manner thanks to data visualizations.
• Explains what the field of Information Systems is and, having conducted several hundred interviews, what he looks for in the data scientists he hires.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Data Science Tags causalityinterference, causation, causality, analytics, DataScience, data science, Data Science, Data science, SuperDataScience, superdatascience

The Four Requirements for Expertise

Added on October 7, 2022 by Jon Krohn.

The author Malcolm Gladwell popularized the idea that it takes 10,000 hours of practice to become an expert at something, whether it be chess, piano, tennis, statistics, or software development. The problem with this notion is that it can actually be easy to spend 10,000 hours or even multiples of that on some activities without developing any expertise.

Read More

TEDx Talk: How Neuroscience Inspires A.I. Breakthroughs that will Change the World

Added on October 7, 2022 by Jon Krohn.

My first TED-format talk is live! In it, I use (A.I.-generated!) visuals to color how A.I. will transform the world in our lifetimes, with particular emphases on climate change, food security, and healthcare innovations.

Thanks to Christina, Banu, and everyone at TEDxDrexelU for inviting me to speak, organizing a slick event, and masterfully editing the footage of my talk.

Thanks to Ed, Andrew, and Shaan at Nebula.io for providing invaluable feedback on drafts of my talk. It's only due to your constructive criticism that the final version turned out as well as it did. Thanks as well to Steven and Alex at Wynden Stark for kindly covering the travel costs of any employees that came down to Philadelphia to see the talk in-person.

Finally, thanks to Taya and Hannah at OpenAI for providing me with early access to custom images from their DALL-E 2 model. These were critical to me being able to tell the effectively convey the narrative I yearned to.

In YouTube, Professional Development, Live Training, Data Science, Computer Science Tags tedxtalks, AI, ai, DataScience, TEDtalk

Data Science Interviews with Nick Singh

Added on October 4, 2022 by Jon Krohn.

For an episode all about tips for crushing interviews for Data Scientist roles, our guest is Nick Singh — author of the bestselling "Ace the Data Science Interview" book and creator of the DataLemur SQL interview platform.

Nick:
• Co-authored “Ace the Data Science Interview”, an interview-question guide that has sold over 16,000 copies since it was released last year.
• Created the DataLemur platform for interactively practicing interview questions involving SQL queries.
• Worked as a software engineer at Facebook, Google, and Microsoft.
• Holds a BS in engineering from the University of Virginia.

Today's episode is ideal for folks who are looking to land a data science job for the first time, level-up into a more senior data science role, or perhaps land a data science gig at a new firm.

In this episode, Nick details:
• His top tips for success in data science interviews.
• Common misconceptions about data science interviews.
• How to become comfortable with self-promotion and increase your chances of landing your dream job.
• Strategies for when interviewers ask if you have any questions for them.
• The subject areas and skills you should master before heading into a data science interview.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Interview, YouTube, SuperDataScience, Professional Development, Podcast, Personal Improvement, Data Science, Computer Science Tags SuperDataScience, superdatascience, interview, AI, datascience, data science, Data Science, Data science, job, jobs, SQL, sql, python

Thriving on Information Overload

Added on September 30, 2022 by Jon Krohn.

It’s the start of something new, with the first of our extended Five-Minute Friday episodes starting this

week! The author of ‘Thriving on Overload’, Ross Dawson joins Jon to discuss his five powers for

transforming information overwhelm into productivity, abundance and happiness.

Read More

Causal Machine Learning

Added on September 27, 2022 by Jon Krohn.

Causal ML is today's focus with Dr. Emre Kiciman — Senior Principal Researcher at Microsoft, developer of the DoWhy causal modeling library for Python, and a leader in applying causal research to social sciences.

Emre:
• Has worked within prestigious Microsoft Research for over 17 years.
• Leads Microsoft’s research on Causal Machine Learning.
• Leads development of the DoWhy open-source causal modeling library for Python (part of the PyWhy GitHub project).
• Pioneered the use of social media data to answer causal questions in the social sciences, such as with respect to physical and mental health.
• Has published 100+ papers and been cited 8000+ times.
• Holds a PhD in Computer Science from Stanford University.

Today’s episode is relatively technical, so will probably appeal primarily to folks with technical backgrounds like data scientists, ML engineers, and software developers.

In this episode, Emre details:
• What Causal ML is and how it’s different from "correlational" ML.
• The four key steps of causal inference and how they impact ML.
• The types of data that are most amenable to causal methods and those that aren’t yet… but may be soon.
• Exciting real-world applications of Causal ML.
• The software tools he most highly recommends.
• What he looks for in the data science researchers he hires.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In YouTube, SuperDataScience, Podcast, Data Science Tags SuperDataScience, superdatascience, causality, causalityinterference, DataScience, datascientist, datascience, ML, ml, machi, machine learning, Machine Learning, AI, ai, artificial intelligence, Artificial Inteligence

More Guests on Fridays

Added on September 23, 2022 by Jon Krohn.

Going forward, we are still going to have short, five-minute-ish episodes on Friday that feature me solo, but we will increasingly be interspersing in inspiring guests. And I won’t be making an effort to have these Friday guest episodes be anywhere near five minutes long — to start, I’m thinking of having them typically be 20 to 30 minutes long, but we’ll see how it goes with the guests and what the reception is like from you.

Read More
In YouTube, SuperDataScience, Podcast, Interview, Five-Minute Friday, Data Science Tags SuperDataScience, superdatascience, AI, podcast, podcasts, personaldevelopment, Self Improvement
← Newer Posts Older Posts →
Back to Top