I’ve been excited all year this year about the potential for AI to revolutionize agricultural robotics and help us feed the planet with high-quality nutrition. So, I’m jazzed today to be digging into an innovative application of computer vision and robotics in agriculture, specifically in viticulture — the delicate cultivation of super-expensive grapes for making wine. And, yeah, wine may not provide the world with high-quality nutrition, but the same technologies developed for delicate wine grapes will be transferrable to other plants as well.
Read MoreFiltering by Tag: Meta
Llama 2, Toolformer and BLOOM: Open-Source LLMs with Meta’s Dr. Thomas Scialom
Thomas Scialom, PhD is behind many of the most popular Generative A.I. projects including Llama 2, the world's top open-source LLM. Today, the Meta A.I. researcher reveals the stories behind Llama 2 and what's in the works for Llama 3.
Thomas:
• Is an A.I. Research Scientist at Meta.
• Is behind some of the world’s best-known Generative A.I. projects including Llama 2, BLOOM, Toolformer and Galactica.
• Is contributing to the development of Artificial General Intelligence (AGI).
• Has lectured at many of the top A.I. labs (e.g., Google, Stanford, MILA).
• Holds a PhD from Sorbonne University, where he specialized in Natural-Language Generation with Reinforcement Learning.
Today’s episode should be equally appealing to hands-on machine learning practitioners as well as folks who may not be hands on but are nevertheless keen to understand the state-of-the-art in A.I. from someone who’s right on the cutting edge of it all.
In this episode, Thomas details:
• Llama 2, today’s top open-source LLM, including what is what like behind the scenes developing it and what we can expect from the eventual Llama 3 and related open-source projects.
• The Toolformer LLM that learns how to use external tools.
• The Galactica science-specific LLM, why it was brought down after a few days, and how it might eventually re-emerge in a new form.
• How RLHF — reinforcement learning from human feedback — shifts the distribution of generative A.I. outputs from approximating the average of human responses to excellent, often superhuman quality.
• How soon he thinks AGI — artificial general intelligence — will be realized and how.
• How to make the most of the Generative A.I. boom as an entrepreneur.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Code Llama
Meta's Llama 2 offered state-of-the-art performance for an "open-source"* LLM... except on tasks involving code. Now Code Llama is here and it magnificently fills that gap by outperforming all other open-source LLMs on coding benchmarks.
Read MoreBig A.I. R&D Risks Reap Big Societal Rewards, with Meta’s Dr. Laurens van der Maaten
By making big research bets, the prolific Meta Senior Research Director Dr. Laurens van der Maaten has devised or supported countless world-changing machine-learning innovations across healthcare, climate change, privacy and more.
Laurens:
• Is a Senior Research Director at Meta, overseeing swathes of their high-risk, high-reward A.I. projects with application areas as diverse as augmented reality, biological protein synthesis and tackling climate change.
• Developed the "CrypTen" privacy-preserving ML framework.
• Pioneered web-scale weakly supervised training of image-recognition models.
• Along with the iconic Geoff Hinton, created the t-SNE dimensionality reduction technique (this paper alone has been cited over 36,000 times).
• In aggregate, his works have been cited nearly 100,000 times!
• Holds a PhD in machine learning from Tilburg University in the Netherlands.
Today’s episode will probably appeal primarily to hands-on data science practitioners, but there is tons of content in this episode for anyone who’d like to appreciate the state of the art in A.I. across a broad range of socially impactful, super-cool applications.
In this episode, Laurens details:
• How he pioneered learning across billions of weakly labeled images to create a state-of-the-art machine-vision model.
• How A.I. can be applied to the synthesis of new biological proteins with implications for both medicine and agriculture.
• Specific ways A.I. is being used to tackle climate change as well as to simulate wearable materials for enhancing augmented-reality interactivity.
• A library just like PyTorch but where all the computations are encrypted.
• The wide range of applications of his ubiquitous dimensionality-reduction approach.
• His vision for the impact of A.I. on society in the coming decades.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLaMA: GPT-3 performance, 10x smaller
By training (relatively) small LLMs for (much) longer, Meta AI's LLaMA architectures achieve GPT-3-like outputs at as little as a thirteenth of GPT-3's size. This means cost savings and much faster execution time.
LLaMA, a clever nod to LLMs (Large Language Models), is Meta AI's latest contribution to the AI world. Based on the Chinchilla scaling laws, LLaMA adopts a principle that veers away from the norm. Unlike its predecessors, which boasted hundreds of millions of parameters, LLaMA emphasizes training smaller models for longer durations to achieve enhanced performance.
The Chinchilla Principle in LLaMA
The Chinchilla scaling laws, introduced by Hoffmann and colleagues, postulate that extended training of smaller models can lead to superior performance. LLaMA, with its 7 billion to 65 billion parameter models, is a testament to this principle. For perspective, GPT-3 has 175 billion parameters, making the smallest LLaMA model just a fraction of its size.
Training Longer for Greater Performance
Meta AI's LLaMA pushes the boundaries by training these relatively smaller models for significantly longer periods than conventional approaches. This method contrasts with last year's top models like Chinchilla, GPT-3, and PaLM, which relied on undisclosed training data. LLaMA, however, uses entirely open-source data, including datasets like English Common Crawl, C4, GitHub, Wikipedia, and others, adding to its appeal and accessibility.
LLaMA's Remarkable Achievements
LLaMA's achievements are notable. The 13 billion parameter model (LLaMA 13B) outperforms GPT-3 in most benchmarks, despite having 13 times fewer parameters. This implies that LLaMA 13 can offer GPT-3 like performance on a single GPU. The largest LLaMA model, 65B, competes with giants like Chinchilla 70B and PaLM, even preceding the release of GPT-4.
This approach signifies a shift in the AI paradigm – achieving state-of-the-art performance without the need for enormous models. It's a leap forward in making advanced AI more accessible and environmentally friendly. The model weights, though intended for researchers, have been leaked and are available for non-commercial use, further democratizing access to cutting-edge AI.
LLaMA not only establishes a new benchmark in AI efficiency but also sets the stage for future innovations. Building on LLaMA's foundation, models like Alpaca, Vicuna, and GPT4ALL have emerged, fine-tuned on thoughtful datasets to exceed even LLaMA's performance. These developments herald a new era in AI, where size doesn't always equate to capability.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Astonishing CICERO negotiates and builds trust with humans using natural language
Meta AI's CICERO algorithm — which negotiates and build trust with humans to perform in the top decile at the game of Diplomacy — is (in my view) the most astounding A.I. feat yet. Hear all about it from Alexander.
As published in the prestigious academic journal Science in November, CICERO is capable of using natural-language conversation to coordinate with humans, develop strategic alliances, and ultimately win in Diplomacy, an extremely complex board game.
Excelling in a game with incomplete information and vastly more possible states of play than games previously conquered by A.I. like chess and go would be a wild feat in and of itself, but CICERO’s generative capacity to converse and negotiate in real-time with six other human players in order to strategize victoriously is the truly mind-boggling capability.
To detail for you how the game of Diplomacy works, why Meta chose to tackle this game with A.I., and how they developed a model that competes in the top decile of human Diplomacy players without any other players even catching a whiff that CICERO could possibly be a machine, my guest in today's episode is Alexander Holden Miller, a co-author of the CICERO paper.
Alex:
• Has been working in Meta AI’s Fundamental AI Research group, FAIR, for nearly eight years.
• Currently serves as a Senior Research Engineering Manager within FAIR.
• Has supported researchers working in most ML sub-domains but has been especially involved in conversational A.I. research and more recently reinforcement learning and planning.
• Holds a degree in Computer Science from Cornell University.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.