• Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
  • Menu

Jon Krohn

  • Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
Jon Krohn

LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

Added on July 8, 2025 by Jon Krohn.

Sensational episode for you today with the illustrious A.I. author, educator and entrepreneur Sinan Ozdemir on how LLM benchmarks are lying to you... and what you can do about it.

Sinan:

  • Is Founder and CTO of LoopGenius, a generative A.I. startup.

  • Authored several excellent books, including, most recently, the bestselling "Quick Start Guide to Large Language Models".

  • Hosts the "Practically Intelligent" podcast.

  • Was previously adjunct faculty at The Johns Hopkins University, now teaches several times a month within the O'Reilly platform.

  • Serial A.I. entrepreneur, including founding a Y Combinator-backed generative A.I. startup way back in 2015 that was later acquired.

  • Holds a Master’s in Pure Math from Johns Hopkins.

Today’s episode skews slightly toward our more technical listeners but Sinan excels at explaining complex concepts in a clear way so today’s episode may appeal to any listener of this podcast.

In today’s episode, Sinan details:

Why the A.I. benchmarks everyone relies on might be lying to you.

  • How the leading A.I. labs are gaming the benchmark system.

  • Tricks to actually effectively evaluate LLMs’ capabilities for your use cases.

  • What the future of benchmarking will involve, including how to benchmark agentic and multimodal models.

  • How a simple question about watermelon seeds reveals the 40% failure rate of even today’s most advanced A.I. models.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, SuperDataScience, YouTube, Interview Tags superdatascience, ai, llm, llms, benchmarks
Older: In Case You Missed It in June 2025 →
Back to Top