Jon Krohn, Cajoler of Datums

General Data Scientist Tools

As initially outlined in my post on Data Scientist Skills and Salaries, here is a list of key data science tools. With a focus on coding in Python wherever possible, they are:

It's also helpful to develop familiarity with:

Note that these tools generally appear in the open-source Hadoop cluster in the O'Reilly Data Science Salary Survey. Based on demand and relative compensation, it appears that valuable next steps to becoming a unicorn-variety data scientist would be to equip oneself with parallel-processing tools (e.g., SparkHivePig). 

 

 

Deep Learning

First Steps. For people in New York, I founded a Deep Learning Study Group. If you're further afield, you can track our progress via GitHub. Based on my experience with the study group, I have recorded a six-hour video series titled Deep Learning with TensorFlow LiveLessons that is available within Safari. As an intuitive, interactive introduction, the notebooks of code built over the course of the LiveLessons are available for free in GitHub.  In addition, I offer a part-time in-classroom Deep Learning course at the NYC Data Science Academy. 

Otherwise, get a lay of the land from: 

  • the sequence of courses suggested by Greg Brockman, or
  • this (more comprehensive) introductory resource post from Ofir Press, or
  • this (even more comprehensive) guide from YerevaNN Research Lab

Textbooks. Relative to viewing lectures, I prefer reading and working through problems. The stand-out resources for this, in the order I recommend tackling them are: 

Interactive Demos. Top-drawer interactive demos you can develop an intuitive sense of neural networks from are provided by: 

  • Distill, the academic publication for visualising machine learning research
  • Chris Olah
  • the illustrious Andrej Karpathy 
  • fun, concise, browser-based (i.e., JavaScript) self-driving cars
  • ...in addition, I've curated introductory Jupyter notebooks across the popular libraries TFLearn, Keras, Theano, and TensorFlow here

Applications. Scroll down to see my recommendations for high-quality data sources as well as global issues in need of solutions. Problems worth solving with deep learning approaches in particular are curated by OpenAI. In addition, if you're at the stage that you'd like to test a General AI across a range of applications (e.g., games), work with: 

Academic Papers. If you're looking for the latest deep learning research, bookmark: 

The Past. Histories of Deep Learning: 

The Future. Insights into emerging trends:

 

 

Lay Primers on Software and Artificial Intelligence

 

 

Fun Online Primers for Data Science Techniques

 

 

Excellent Lay Books on Statistics

 

 

Open Data Sources

To train a powerful model, the larger the data set, the better -- if it's well-organised and open, that's ideal. The following repositories are standouts that meet all these criteria: 

For machine learning models that require a lot of labelled data, check out:

Finally, here are extensive pages on importing data from the Web into R, provided by CRAN and MRAN

 

 

Meetups

 

 

News

 

 

References

 

 

Clarity and Productivity

 

 

Charitable Projects

DataKind is a well-respected platform for finding humanitarian causes to apply your data science skills to. 

 

 

Problems Worth Solving

 

 

List of Additional Tools

  • LaTeX for creating beautiful documents, including Beamer for slideshows and Pandoc for conversion to countless other formats (e.g., word processor formats for sharing with coworkers)
  • Amazon AWS, especially S3 buckets, EC2, and Redshift
  • I love the Mathematica-based Wolfram Alpha web interface, for funsies and for learning about mathematical concepts
  • Plotly is a free, easy-to-use GUI for collaboratively creating aesthetically-pleasing visualisations
  • if you would like a slick, professional tool for mining data from patents, companies and/or the news, check out Quid, which I used extensively for a political project
network.jpg

Eudaemonia

For a life of flourishing -- a life of beauty, truth, justice, play and love -- choose mathematics