jon-krohn-lecturing.jpg

Machine Learning and Data Science Resources

Machine Learning Foundations

Four subject areas provide strong foundations for understanding and applying machine learning theory: linear algebra, calculus, probability/statistics, and computer science. For my comprehensive curriculum covering all of these subject areas, check out my Courses page or my Machine Learning Foundations GitHub repository. My favorite resources on these subjects areas, largely from other folks, are immediately below.

Linear Algebra

Calculus

Probability & Statistics

Computer Science

Machine Learning

You can hop straight into applying machine learning without mastering the foundational subjects (listed above) first. Indeed, this can be a fun approach to learning ML because you can become familiar with what ML can do at a high level prior to getting into the nitty-gritty of the underlying mathematics and probability. The best book for jumping straight into applications is Aurélien Géron's Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, which I had the great pleasure of reviewing and editing.

If however, you’re already comfortable with the mathematical and probabilistic foundational subjects, my favorite ML books are:

Deep Learning

First Steps in Deep Learning

Deep learning is a specialized field within machine learning. Traditionally, one would already be comfortable with machine learning before getting into it. Modern deep learning libraries, however, make learning about artificial neural networks easy — even if you aren’t too familiar with ML or the foundational mathematical subjects underlying it (see sections above). I wrote my book Deep Learning Illustrated to be the best-possible resource for folks getting started with neural networks and artificial intelligence, including if you haven’t studied much linear algebra, calculus, probability theory, or ML before.

Based on my book, I have also published 18 hours of interactive introductory tutorials:

The notebooks of code built over the course of the videos are available for free in GitHub. In addition, I offer a comprehensive, 30-hour Deep Learning course at the NYC Data Science Academy if you like the structure and personal nature of the in-classroom experience.

Otherwise, get a lay of the land from: 

  • the sequence of courses suggested by Greg Brockman

  • this (more comprehensive) introductory resource post from Ofir Press

  • this (even more comprehensive) guide from YerevaNN Research Lab

Deep Learning Books

Relative to viewing lectures, I prefer reading and working through problems. Beyond my own book, the stand-out resources for this, in the order I recommend tackling them are: 

Interactive Deep Learning Demos

Top-drawer interactive demos you can develop an intuitive sense of neural networks from are provided by: 

  • Distill, the academic publication for visualising machine learning research

  • Chris Olah

  • the illustrious Andrej Karpathy

  • fun, concise, browser-based (i.e., JavaScript) self-driving cars

  • ML-Showcase, a curated collection of remarkable deep-learning focused demos

  • ...in addition, I've curated introductory Jupyter notebooks across the popular libraries TFLearn, Keras, Theano, and TensorFlow here

Applications of Deep Learning

Scroll further down the page down to see my recommendations for high-quality data sources as well as global issues in need of solutions. Problems worth solving with deep learning approaches in particular are curated by OpenAI. In addition, if you're at the stage that you'd like to test a deep reinforcement learning algorithm across a range of applications (e.g., games), work with: 

Time Series Prediction, e.g., Financial Applications

Transformers

Academic Deep Learning Papers

If you're looking for the latest deep learning research, check out: 

Deep Learning Hardware

Here is the part list for a deep learning server that I built.

Cloud Infrastructure for quickly scripting and training Deep Learning models

Histories of Deep Learning

The Future of Deep Learning

Podcasts

  • I’m privileged to host the SuperDataScience podcast, which airs twice a week and has over 10k listeners per episode. Along with inspiring guests from a broad range of career backgrounds, we focus on the latest in machine learning and data science across both academia and industry. We have content appropriate for any listener, whether you’re simply curious about A.I. or a deep technical expert.

  • In 2020, I piloted four episodes of a lighthearted AI/ML news show called A4N: the Artificial Neural Network News Network. It was a ton of fun and someday we may record more episodes but for the foreseeable future I’ll be consumed by the SuperDataScience show.

  • Shivam Rana put together a beautifully well-organized website of data science podcasts called DSPods, so you can check that out for other shows, whatever you’re looking for.

YouTube Channels

Open Data Sources

To train a powerful model, the larger the data set, the better -- if it's well-organised and open, that's ideal. The following repositories are standouts that meet all these criteria: 

For machine learning models that require a lot of labelled data, check out:

If none of the above data sources suit your needs, Google provides a dataset-specific search tool.

Problems Worth Solving

Medical Applications of Deep Learning

Charitable Projects

  • DataKind is a well-respected platform for finding humanitarian causes to apply your data science skills to.

  • AI for Good provides opportunities to tackle the UN’s sustainable development goals with data and ML. 

General Data Scientist Tools

As initially outlined in my post on Data Scientist Skills and Salaries, here is a list of key data science tools. With a focus on coding in Python wherever possible, they are:

It's also helpful to develop familiarity with:

Note that these tools generally appear in the open-source Hadoop cluster in the O'Reilly Data Science Salary Survey. Based on demand and relative compensation, it appears that valuable next steps to becoming a unicorn-variety data scientist would be to equip oneself with distributed computing tools (e.g., Spark) and model deployment skills (e.g., software engineering).

Fun Online Primers for Data Science Techniques

 

Lay Primers on Software and Artificial Intelligence

Excellent Lay Books on Math/Stats

 

Meetups

News

References

Clarity and Productivity

 

List of Additional Tools

  • LaTeX for creating beautiful documents, including Beamer for slideshows and Pandoc for conversion to countless other formats (e.g., word processor formats for sharing with coworkers)

  • I love the Mathematica-based Wolfram Alpha web interface for learning about mathematical concepts interactively

  • Plotly is a free, easy-to-use GUI for collaboratively creating aesthetically-pleasing visualisations

Eudaemonia

For a life of flourishing -- a life of beauty, truth, justice, play and love -- choose mathematics.