• Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
  • Menu

Jon Krohn

  • Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
Jon Krohn

Llama 3.1 405B: The First Open-Source Frontier LLM

Added on August 7, 2024 by Jon Krohn.

Meta releasing its giant (405-billion parameter) Llama 3.1 model is a game-changer: For the first time, an "open-source" LLM competes at the frontier (against proprietary models GPT-4o and Claude).

KEY INFO

  • The 405B member of the Llama 3.1 model family (grain of salt: according to Meta's own research and data) performs (on both benchmarks as well as on human evaluations) on par with the closed-source, proprietary models that are at the absolute frontier of generative A.I. capabilities (i.e., OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet and Google's Gemini).

  • As part of this Llama 3.1 release, Meta also provided 8B and 70B models, which seem to outperform similarly-sized open-source competitors like Google's Gemma 7B and Mistral AI's Mixtral 8x22B, respectively.

  • Like earlier Llama releases, Meta has additionally provided fine-tuned versions of these LLMs for instruction-following and chat applications.

  • Expanded context window to 128,000 tokens (approx. 100,000 words) lags far behind Gemini (with a 2-million token window) but otherwise is near the context-window frontier.

  • Multilingual support for 8 languages (English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai).

TECHNICAL INFO

  • Trained on over 15 trillion tokens using 16k NVIDIA H100 GPUs.

  • Decoder-only transformer architecture for training stability (as opposed to, say, a mixture-of-experts approach).

  • Post-training involving supervised fine-tuning and Direct Preference Optimization (DPO).

  • New safety tools: "Llama Guard 3" for content moderation and "Prompt Guard" against prompt injection attacks.

IMPACT & ACCESS

  • While not truly "open-source" (because only model weights are provided, not data or code), releasing an LLM that competes at the frontier may raise safety concerns (malevolent actors now have unfettered access to cutting-edge A.I. tech), but for the most part, this should be a boon to A.I. application developers and make a positive impact on society by providing more flexibility for innovation across various industries (e.g., healthcare, education, science).

  • Wide accessibility through partnerships with Amazon Web Services (AWS), Databricks, Snowflake, NVIDIA and others (even with Google Cloud and Microsoft Azure, Meta's big-tech competitors who were previously solely claiming the frontier of LLM capabilities with proprietary models).

  • Available on GitHub and Hugging Face for immediate access, fine-tuning and deployment on your own infrastructure.

WHY WOULD META DO THIS?

  • Helps them compete for top A.I. talent.

  • Undercuts big-tech rivals by commoditizing frontier GenAI.

  • Claim that open-source increases security by allowing anyone to kick tires.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Five-Minute Friday, Podcast, SuperDataScience, YouTube Tags ai, llms, SuperDataScience
← Newer: Superintelligence and the Six Singularities, with Dr. Daniel Hulme Older: How to Be a Supercommunicator, with Charles Duhigg →
Back to Top