• Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
  • Menu

Jon Krohn

  • Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
Jon Krohn

Multiple Independent Observations

Added on April 24, 2022 by Jon Krohn.

In this week's YouTube tutorial, we consider probabilistic events where we have multiple independent observations — such as flipping a coin two or more times instead of just once.

We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

In Data Science, ML Foundations, Probability, Professional Development, YouTube Tags machinelearning, ml, probability, statistics, video, tutorials

Open-Access Publishing

Added on April 24, 2022 by Jon Krohn.

This week Dr. Amy Brand, the pioneering Director of The MIT Press and executive producer of documentary films, leads discussion of the benefits of — and innovations in — open-access publishing.

In the episode, Amy details:
• What open-access means.
• Why open-access papers, books, data, and code are invaluable for data scientists and anyone else doing research and development.
• The new metadata standard she developed to resolve issues around accurate attribution of who did what for a given academic publication.
• How we can change the STEM fields to be welcoming to everyone, including historically underrepresented groups.
• What it’s like to devise and create an award-winning documentary film.

Amy:
• Leads one of the world’s most influential university presses as the Director and Publisher of the MIT Press.
• Created a new open-access business model called Direct to Open.
• Is Co-Founder of Knowledge Futures Group, a non-profit that provides technology to empower organizations to build the digital infrastructure required for open-access publishing.
• Launched MIT Press Kids, the first university+kids publishers collab.
• Was the executive producer of "Picture A Scientist", a documentary that was selected to premiere at the prestigious Tribeca Film Festival and was recognized with the 2021 Kavli Science Journalism Award.
• She holds a PhD in Cognitive Science from MIT.

Today’s episode is well-suited to a broad audience, not just data scientists.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Interview, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, openaccess, publishing, film, documentaryfilm

Events and Sample Spaces

Added on April 18, 2022 by Jon Krohn.

In this week's YouTube tutorial, I introduce the most fundamental atoms of probability theory: events and sample spaces. Enjoy 😀

We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

In Data Science, ML Foundations, Probability, Statistics, Professional Development, YouTube Tags machinelearning, ml, statistics, probability, video, tutorials

AGI: The Apocalypse Machine

Added on April 18, 2022 by Jon Krohn.

Jeremie Harris's work on A.I. could dramatically alter your perspective on the field of data science and the bewildering — perhaps downright frightening — impact you and A.I. could make together on the world.

Jeremie:
• Recently co-founded Mercurius, an A.I. safety company.
• Has briefed senior political and policy leaders around the world on long-term risks from A.I., including senior members of the U.K. Cabinet Office, the Canadian Cabinet, as well as the U.S. Departments of State, Homeland Security and Defense.
• Is Host of the excellent Towards Data Science podcast.
• He previously co-founded SharpestMinds, a Y Combinator-backed mentorship marketplace for data scientists.
• He proudly dropped out of his quantum mechanics PhD to found SharpestMinds.
• He hold a Master’s in biological physics from the University of Toronto.

In this episode, Jeremie details:
• What Artificial General Intelligence (AGI) is
• How the development of AGI could happen in our lifetime and could present an existential risk to humans, perhaps even to all life on the planet as we know it.
• How, alternatively, if engineered properly, AGI could herald a moment called the singularity that brings with it a level of prosperity that is not even imaginable today.
• What it takes to become an AI safety expert yourself in order to help align AGI with benevolent human goals
• His forthcoming book on quantum mechanics
• Why almost nobody should do a PhD

Today’s episode is deep and intense, but as usual it does still have a lot of laughs, and it should appeal broadly, no matter whether you’re a technical data science expert already or not.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Interview, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, ai, singularity, machinelearning, intelligence

Clem Delangue on Hugging Face and Transformers

Added on April 11, 2022 by Jon Krohn.

In today's SuperDataScience episode, Hugging Face CEO Clem Delangue fills us in on how open-source transformer architectures are accelerating ML capabilities. Recorded for yesterday's ScaleUp:AI conference in NY.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Five-Minute Friday, Interview, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, machinelearning, ai, transformers, podcast

What Probability Theory Is

Added on April 11, 2022 by Jon Krohn.

This week, we start digging into the actual, uh, theory of Probability Theory. I also highlight the field's relevance to Machine Learning and Statistics. Enjoy 😀

We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

In Data Science, ML Foundations, Statistics, Professional Development, Probability, YouTube Tags machinelearning, ml, statistics, probability, video, tutorials

How to Rock at Data Science — with Tina Huang

Added on April 11, 2022 by Jon Krohn.

Can you tell I had fun filming this episode with Tina Huang, YouTube data science superstar (293k subscribers)? In it, we laugh while discussing how to get started in data science and her learning/productivity tricks.

Tina:
• Creates YouTube videos with millions of views on data science careers, learning to code, SQL, productivity, and study techniques.
• Is a data scientist at one of the world's largest tech companies (she keeps the firm anonymous so she can publish more freely).
• Previously worked at Goldman Sachs and the Ontario Institute for Cancer Research.
• Holds a Masters in Computer and Information Technology from the University of Pennsylvania and a bachelors in Pharmacology from the University of Toronto

In this episode, Tina details:
• Her guidance for preparing for a career in data science from scratch.
• Her five steps for consistently doing anything.
• Her strategies for learning effectively and efficiently.
• What the day-to-day is like for a data scientist at one of the world’s largest tech companies.
• The software languages she uses regularly.
• Her SQL course.
• How her science and computer science backgrounds help her as a data scientist today.

Today’s episode should be appealing to a broad audience, whether you’re thinking of getting started in data science, are already an experienced data scientist, or you’re more generally keen to pick up career and productivity tips from a light-hearted conversation.

Thanks to Serg Masís, Brindha Ganesan and Ken Jee for providing questions for Tina... in Ken's case, a very silly question indeed.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Interview, Podcast, SuperDataScience, YouTube Tags superdatascience, datascientist, career, sql, productivity

Daily Habit #8: Math or Computer Science Exercise

Added on April 4, 2022 by Jon Krohn.

This article was originally adapted from a podcast, which you can check out here.

At the beginning of the new year, in Episode #538, I introduced the practice of habit tracking and provided you with a template habit-tracking spreadsheet. Then, we had a series of Five-Minute Fridays that revolved around daily habits I espouse, and that theme continues today. The habits we covered in January and February were related to my morning routine.

Starting last week, we began coverage of habits on intellectual stimulation and productivity. Specifically, last week’s habit was “reading two pages”. This week, we’re moving onward with doing a daily technical exercise; in my case, this is either a mathematics, computer science, or programming exercise.

The reason why I have this daily-technical-exercise habit is that data science is both a limitlessly broad field as well as an ever-evolving field. If we keep learning on a regular basis, we can expand our capabilities and open doors to new professional opportunities. This is one of the driving ideas behind the #66daysofdata hashtag, which — if you haven’t heard of it before — is detailed in episode #555 with Ken Jee, who originated the now-ubiquitous hashtag.

Read More
In Calculus, Computer Science, Data Science, Five-Minute Friday, Personal Improvement, Podcast, Professional Development, YouTube, SuperDataScience Tags superdatascience, datascience, machinelearning, math, habit

A Brief History of Probability Theory

Added on April 4, 2022 by Jon Krohn.

This week's YouTube video is a quick introduction to the fascinating history of Probability Theory. Next week, we'll actually start digging into Probability Theory, uh, theory 😉

We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

In Data Science, ML Foundations, Probability, Statistics, Professional Development, YouTube Tags machinelearning, ml, ai, probability, history, video

Engineering Data APIs

Added on April 4, 2022 by Jon Krohn.

How you design a data API from scratch and how a data API can leverage machine learning to improve the quality of healthcare delivery are topics covered by Ribbon Health CTO Nate Fox in this week's episode.

Ribbon Health is a New York-based API platform for healthcare data that has raised $55m, including from some of the biggest names in venture capital like Andreessen Horowitz and General Catalyst.

Prior to Ribbon, Nate:
• Worked as an Analytics Engineer at the marketing start-up Unified.
• Was a Product Marketing Manager at Microsoft.
• Obtained a mechanical engineering degree from the Massachusetts Institute of Technology and an MBA from Harvard Business School.

In this episode, Nate details:
• What APIs ("application programming interfaces") are.
• How you design a data API from scratch.
• How Ribbon Health’s data API leverages machine learning models to improve the quality of healthcare delivery.
• How to ensure the uptime and reliability of APIs.
• How scientists and engineers can make a big social impact in health technology.
• His favorite tool for easily scaling up the impact of a data science model to any number of users.
• What he looks for in the data scientists he hires.

Today’s episode has some technical data science and software engineering elements here and there, but much of the conversation should be interesting to anyone who’s keen to understand how data science can play a big part in improving healthcare.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Interview, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, machinelearning, apidevelopment, healthcare

Daily Habit #7: Read Two Pages

Added on March 28, 2022 by Jon Krohn.

At the beginning of the new year, in Episode #538, I introduced the practice of habit tracking and provided you with a template habit-tracking spreadsheet. Then, we had a series of Five-Minute Fridays that revolved around daily habits I espouse and that theme continues today. The habits we covered in January and February were my morning habits, specifically:

  • Starting the day with a glass of water

  • Making my bed

  • Carrying out alternate-nostril breathing

  • Meditating

  • Writing morning pages

Now, we’ll continue on with habits that extend beyond just my morning with a block of habits on intellectual stimulation and productivity. Specifically, today’s habit is “reading two pages”.

Read More
In Five-Minute Friday, Personal Improvement, Podcast, SuperDataScience, YouTube Tags superdatascience, productivity, habit, reading, podcast

Probability & Information Theory — Subject 5 of Machine Learning Foundations

Added on March 28, 2022 by Jon Krohn.

Last Wednesday, we released the final video of my Calculus course, so today we begin my all-new YouTube course on Probability and Information Theory. This first video is an orientation to the course curriculum, enjoy!

We will publish a new video from my "Probability for Machine Learning" course to YouTube every Wednesday. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Calculus, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

In Data Science, ML Foundations, Professional Development, YouTube, Probability, Statistics Tags machinelearning, ml, probability, statistics, python, video

GPT-3 for Natural Language Processing

Added on March 28, 2022 by Jon Krohn.

With its human-level capacity on tasks as diverse as question-answering, translation, and arithmetic, GPT-3 is a game-changer for A.I. This week's brilliant guest, Melanie Subbiah, was a lead author of the GPT-3 paper.

GPT-3 is a natural language processing (NLP) model with 175 billion parameters that has demonstrated unprecedented and remarkable "few-shot learning" on the diverse tasks mentioned above (translation between languages, question-answering, performing three-digit arithmetic) as well as on many more (discussed in the episode).

Melanie's paper sent shockwaves through the mainstream media and was recognized with an Outstanding Paper Award from NeurIPS (the most prestigious machine learning conference) in 2020.

Melanie:
• Developed GPT-3 while she worked as an A.I. engineer at OpenAI, one of the world’s leading A.I. research outfits.
• Previously worked as an A.I. engineer at Apple.
• Is now pursuing a PhD at Columbia University in the City of New York specializing in NLP.
• Holds a bachelor's in computer science from Williams College.

In this episode, Melanie details:
• What GPT-3 is.
• Why applications of GPT-3 have transformed not only the field of data science but also the broader world.
• The strengths and weaknesses of GPT-3, and how these weaknesses might be addressed with future research.
• Whether transformer-based deep learning models spell doom for creative writers.
• How to address the climate change and bias issues that cloud discussions of large natural language models.
• The machine learning tools she’s most excited about.

This episode does have technical elements that will appeal primarily to practicing data scientists, but Melanie and I put an effort into explaining concepts and providing context wherever we could so hopefully much of this fun, laugh-filled episode will be engaging and informative to anyone who’s keen to learn about the start of the art in natural language processing and A.I.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Interview, Podcast, SuperDataScience, Professional Development, YouTube Tags superdatascience, machinelearning, deeplearning, nlp, gpt3

Jon’s Answers to Questions on Machine Learning

Added on March 21, 2022 by Jon Krohn.

The wonderful folks at the Open Data Science Conference (ODSC) recently asked me five great questions on machine learning. I thought you might like to hear the answers too, so here you are!

Their questions were:
1. Why does my educational content focus on deep learning and on the foundational subjects underlying machine learning?
2. Would you consider deep learning to be an “advanced” data science skill, or is it approachable to newcomers/novice data scientists?
3. What open-source deep learning software is most dominant today?
4. What open-source deep learning software are you looking forward to using more?
5. Do you have a case study where you've used deep learning in practice?

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.


ODSC's blog post of our Q&A is here.

In Data Science, Five-Minute Friday, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, machinelearning, ml, deeplearning, podcast

SuperDataScience Podcast LIVE at MLconf NYC and ScaleUp:AI!

Added on March 21, 2022 by Jon Krohn.

It's finally happening: the first-ever SuperDataScience episodes filmed with a live audience! On March 31 and April 7 in New York, you'll be able to react to guests and ask them questions in real-time. I'm excited 🕺

The first live, in-person episode will be filmed at MLconf NYC on March 31st. The guest will be Alexander Holden Miller, an engineering manager at Facebook A.I. Research who leads bleeding-edge work at mind-blowing intersections of deep reinforcement learning, natural language processing, and creative A.I.

A week later on April 7th, another live, in-person episode will be filmed at ScaleUp:AI. I'll be hosting a panel on open-source machine learning that features Hugging Face CEO Clem Delangue.

I hope to see you at one of these conferences, the first I'll be attending in over two years! Can't wait. There are more live SuperDataScience episodes planned for New York this year and hopefully it won't be long before we're recording episodes live around the world.

In Accouncement, Data Science, Podcast, Professional Development, SuperDataScience, Interview Tags superdatascience, datascience, machinelearning, opensource

My Favorite Calculus Resources

Added on March 21, 2022 by Jon Krohn.

It's my birthday today! In celebration, I'm delighted to be releasing the final video of my "Calculus for Machine Learning" YouTube course. The first video came out in May and now, ten months later, we're done! 🎂

We published a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday since May 6th, 2021. So happy that it's now complete for you to enjoy. Playlist is here.

More detail about my broader "ML Foundations" curriculum (which also covers subject areas like Linear Algebra, Probability, Statistics, Computer Science) and all of the associated open-source code is available in GitHub here.

Starting next Wednesday, we'll begin releasing videos for a new YouTube course of mine: "Probability for Machine Learning". Hope you're excited to get going on it :)

In Calculus, Data Science, ML Foundations, Podcast, Professional Development, SuperDataScience, YouTube Tags machinelearning, ml, datascience, math, calculus, python, video

Effective Pandas

Added on March 21, 2022 by Jon Krohn.

Seven-time bestselling author Matt Harrison reveals his top tips and tricks to enable you to get the most out of Pandas, the leading Python data analysis library. Enjoy!

Matt's books, all of which have been Amazon best-sellers, are:
1. Effective Pandas
2. Illustrated Guide to Learning Python 3
3. Intermediate Python
4. Learning the Pandas Library
5. Effective PyCharm
6. Machine Learning Pocket Reference
7. Pandas Cookbook (now in its second edition)

Beyond being a prolific author, Matt:
• Teaches "Exploratory Data Analysis with Python" at Stanford
• Has taught Python at big organizations like Netflix and NASA
• Has worked as a CTO and Senior Software Engineer
• Holds a degree in Computer Science from Stanford University

On top of Matt's tips for effective Pandas programming, we cover:
• How to squeeze more data into Pandas on a given machine.
• His recommended software libraries for working with tabular data once you have too many data to fit on a single machine.
• How having a computer science education and having worked as a software engineer has been helpful in his data science career.

This episode will appeal primarily to practicing data scientists who are keen to learn about Pandas or keen to become an even deeper expert on Pandas by learning from a world-leading educator on the library.

The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Interview, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, python, pandas, programming, dataanalysis

Jon’s Machine Learning Courses

Added on March 14, 2022 by Jon Krohn.

his article was originally adapted from a podcast, which you can check out here.

For last week’s ​​Five-Minute Friday episode, I provided a summary of the various methods of undertaking my deep learning curriculum, be it via YouTube, my book, or the associated repository of GitHub code. I mentioned at the end of the episode that while teaching this deep learning content to students online and in-person, I discovered that many folks could use a primer on the foundational subjects that underlie machine learning in general and deep learning in particular. So after publishing all my deep learning content, I set to work on creating content that covers these subjects that are critical to understanding machine learning expertly — namely, those subjects are linear algebra, calculus, probability, statistics, and computer science.

Way back in Episode #474 of this podcast, I detailed why these particular subject areas form the sturdy foundations of what I call the Machine Learning House . As a quick recap, the idea is that to be an outstanding data scientist or ML engineer, it doesn't suffice to only know how to use machine learning algorithms via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train innovative models or deploy them to run performantly in production, an in-depth appreciation of machine learning theory may be helpful — or even essential. To cultivate such an in-depth appreciation of ML, one must possess a working understanding of the foundational subjects, which again are linear algebra, calculus, probability, stats, and computer science:

Read More
In Calculus, Data Science, ML Foundations, Professional Development, YouTube Tags superdatascience, machinelearning, ml, ai, math, courses

ScaleUp: AI Conference

Added on March 14, 2022 by Jon Krohn.

At ScaleUp:AI in New York next month, I'll be moderating a panel on Open-Source Software that features Hugging Face CEO Clem Delangue. Other speakers include Andrew Ng, Allie K. Miller, and William Falcon.

Thanks to the folks at Insight Partners for putting together this high-octane, two-day event, in which you'll hear from the foremost thought leaders and investors on how to unlock your firm's A.I. growth potential.

So excited to be conferencing in-person again and I hope to be able to meet you there! There is a virtual option as well if you can't make it to New York. Whether in-person or virtual, you can use my code "JKAI35" to get 35% off 😀

Conference details/registration here.

Full speaker list here.

In Accouncement, Data Science, Professional Development Tags datascience, machinelearning, ai, growth, opensource, event

Finding the Area Under the ROC Curve

Added on March 14, 2022 by Jon Krohn.

In this week's tutorial, we use Python code to find the area under the curve of the receiver operating characteristic (the "ROC curve"). This is a machine learning-specific application of integral calculus.

We publish a new video from my "Calculus for Machine Learning" course to YouTube every Wednesday. Playlist is here.

This is the penultimate video in my Calculus course! After ten months of publishing it, the final video will be released next week :)

More detail about my broader "ML Foundations" curriculum and all of the associated open-source code is available in GitHub here.

In Calculus, Data Science, ML Foundations, Professional Development, YouTube Tags machinelearning, datascience, math, calculus, python, video
← Newer Posts Older Posts →
Back to Top