• Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
  • Menu

Jon Krohn

  • Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
Jon Krohn

The Perils of Manually Labeling Data for Machine Learning Models

Added on December 13, 2022 by Jon Krohn.

The "gold standard" in Machine Learning is to train models with manually labeled data. Shayan Mohanty details why that encodes bias in our models and provides a "weakly-supervised" solution.

Shayan: 
• Is the CEO of Watchful, a Bay Area startup he co-founded to automate the injection of subject-matter-expertise into ML models. 
• Is Guest Scientist at Los Alamos, a renowned national security lab.
• Previously he worked as a data engineer at Facebook.
• Was co-founder and CEO of a pair of other tech startups. 
• Holds a degree in economics from the The University of Texas at Austin. 
 
Today’s episode will be of interest to technical data science experts and non-technical folks alike as it addresses critical issues associated with creating datasets for machine learning models — issues we should be aware of regardless whether we’re more technically or commercially oriented.

In this episode, Shayan details: 
• Why bias in general is good.
• Why degenerative bias in particular is bad. 
• Arguments against using manual labeling.
• How his company Watchful has devised a better alternative to manual labeling — including its fascinating technical underpinnings such as the Chomsky hierarchy of languages and their high-performance Monte Carlo simulation engine.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

← Newer: The Equality Machine Older: Model Error Analysis →
Back to Top