Rehgan Avon's DataConnect conference is this week and is getting rave reviews. In this SuperDataScience episode, Jon Krohn, the silver-tongued entrepreneur details how organizations can successfully adopt A.I.
Read MoreFiltering by Category: Data Science
NLP with Transformers, feat. Hugging Face’s Lewis Tunstall
Lewis Tunstall — brilliant author of the bestseller "NLP with Transformers" and an ML Engineer at Hugging Face — today details how to train and deploy your own LLMs, the race for an open-source ChatGPT, and why RLHF leads to better models.
Dr. Tunstall:
• Is an ML Engineer at Hugging Face, one of the most important companies in data science today because they provide much of the most critical infrastructure for A.I. through open-source projects such as their ubiquitous Transformers library, which has a staggering 100,000 stars on GitHub.
• Is a member of Hugging Face’s prestigious research team, where he is currently focused on bringing us closer to having an open-source equivalent of ChatGPT by building tools that support RLHF (reinforcement learning from human feedback) and large-scale model evaluation.
• Authored “Natural Language Processing with Transformers”, an exceptional bestselling book that was published by O'Reilly last year and covers how to train and deploy Large Language Models (LLMs) using open-source libraries.
• Prior to Hugging Face, was an academic at the University of Bern in Switzerland and held data science roles at several Swiss firms.
• Holds a PhD in theoretical and mathematical physics from Adelaide in Australia.
Today’s episode is definitely on the technical side so will likely appeal most to folks like data scientists and ML engineers, but as usual I made an effort to break down the technical concepts Lewis covered so that anyone who’s keen to be aware of the cutting edge in NLP can follow along.
In the episode, Lewis details:
• What transformers are.
• Why transformers have become the default model architecture in NLP in just a few years.
• How to train NLP models when you have few to no labeled data available.
• How to optimize LLMs for speed when deploying them into production.
• How you can optimally leverage the open-source Hugging Face ecosystem, including their Transformers library and their hub for ML models and data.
• How RLHF aligns LLMs with the outputs users would like.
• How open-source efforts could soon meet or surpass the capabilities of commercial LLMs like ChatGPT.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
YOLO-NAS: The State of the Art in Machine Vision, with Harpreet Sahota
Deci's YOLO-NAS architecture provides today's state of the art in Machine Vision, specifically the key task of Object Detection. Harpreet Sahota joins us from Deci today to detail YOLO-NAS as well as where Computer Vision is going next.
Harpreet:
• Leads the deep learning developer community at Deci AI, an Israeli startup that has raised over $55m in venture capital and that recently open-sourced the YOLO-NAS deep learning model architecture.
• Through prolific data science content creation, including The Artists of Data Science podcast and his LinkedIn live streams, Harpreet has amassed a social-media following in excess of 70,000 followers.
• Previously worked as a lead data scientist and as a biostatistician.
• Holds a master’s in mathematics and statistics from Illinois State University.
Today’s episode will likely appeal most to technical practitioners like data scientists, but we did our best to break down technical concepts so that anyone who’d like to understand the latest in machine vision can follow along.
In the episode, Harpreet details:
• What exactly object detection is.
• How object detection models are evaluated.
• How machine vision models have evolved to excel at object detection, with an emphasis on the modern deep learning approaches.
• How a “neural architecture search” algorithm enabled Deci to develop YOLO-NAS, an optimal object detection model architecture.
• The technical approaches that will enable large architectures like YOLO-NAS to be compute-efficient enough to run on edge devices.
• His “top-down” approach to learning deep learning, including his recommended learning path.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Lossless LLM Weight Compression: Run Huge Models on a Single GPU
Many recent episodes have been focused on open-source Large Language Models that you can download and fine-tune to particular use cases depending on your needs or your users’ needs. I’ve particularly been highlighting LLMs with seven billion up to 13 billion model parameters because this size of model can typically be run on a single consumer GPU so it’s relatively manageable and affordable both to train and have in production.
Read MoreA.I. Accelerators: Hardware Specialized for Deep Learning
Today we’ve got an episode dedicated to the hardware we use to train and run A.I. models (particularly LLMs) such as GPUs, TPUs and AWS's Trainium and Inferentia chips. Ron Diamant may be the best guest on earth for this fascinating topic.
Ron:
• Works at Amazon Web Services (AWS) where he is Chief Architect for their A.I. Accelerator chips, which are designed specifically for training (and making inferences with) deep learning models.
• Holds over 200 patents across a broad range of processing hardware, including security chips, compilers and, of course, A.I. accelerators.
• Has been at AWS for nearly nine years – since the acquisition of the Israeli hardware company Annapurna Labs, where he served as an engineer and project manager.
• Holds a Masters in Electrical Engineering from Technion, the Israel Institute of Technology.
Today’s episode is on the technical side but doesn’t assume any particular hardware expertise. It’s primarily targeted at people who train or deploy machine learning models but might be accessible to a broader range of listeners who are curious about how computer hardware works.
In the episode, Ron details:
• CPUs versus GPUs.
• GPUs versus specialized A.I. Accelerators such as Tensor Processing Units (TPUs) and his own Trainium and Inferentia chips.
• The “AI Flywheel” effect between ML applications and hardware innovations.
• The complex tradeoffs he has to consider when embarking upon a multi-year chip-design project.
• When we get to Large Language Model-scale models with billions of parameters, the various ways we can split up training and inference over our available devices.
• How to get popular ML libraries like PyTorch and TensorFlow to interact optimally with A.I. accelerator chips.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Catch and Fix Harmful Generative A.I. Output
Today, the A.I. entrepreneur Krishna Gade joins me to detail open-source solutions for overcoming the safety and security issues associated with generative A.I. systems, such as those powered by Large Language Models (LLMs).
The remarkably well-spoken Krishna:
• Is Co-Founder and CEO of Fiddler AI, an observability platform that has raised over $45m in venture capital to build trust in A.I. systems.
• Previously worked as an engineering manager on Facebook’s Newsfeed, as Head of Data Engineering at Pinterest, and as a software engineer at both Twitter and Microsoft.
• Holds a Masters in Computer Science from the University of Minnesota.
In this episode, Krishna details:
• How the LLMs that enable Generative A.I. are prone to inaccurate statements, can be biased against protected groups and are susceptible to exposing private data.
• How these undesirable and even harmful LLM outputs can be identified and remedied with open-source solutions like the Fiddler Auditor that his team has built.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Observing LLMs in Production to Automatically Catch Issues
Today, Amber Roberts and Xander Song provide a technical deep dive into the major challenges (such as drift) that A.I. systems (particularly LLMs) face in production. They also detail solutions, such as open-source ML Observability tools.
Both Amber and Xander work at Arize AI, an ML observability platform that has raised over $60m in venture capital.
Amber:
• Serves as an ML Growth Lead at Arize, where she has also been an ML engineer.
• Prior to Arize, worked as an AI/ML product manager at Splunk and as the head of A.I. at Insight Data Science.
• Holds a Masters in Astrophysics from the Universidad de Chile in South America.
Xander:
• Serves as a developer advocate at Arize, specializing in their open-source projects.
• Prior to Arize, he spent three years as an ML engineer.
• Holds a Bachelors in Mathematics from UC Santa Barbara as well as a BA in Philosophy from the University of California, Berkeley.
Today’s episode will appeal primarily to technical folks like data scientists and ML engineers, but we made an effort to break down technical concepts so that it’s accessible to anyone who’d like to understand the major issues that A.I. systems can develop once they’re in production as well as how to overcome these issues.
In the episode, Amber and Xander detail:
• The kinds of drift that can adversely impact a production A.I. system, with a particular focus on the issues that can affect Large Language Models (LLMs).
• What ML Observability is and how it builds upon ML Monitoring to automate the discovery and resolution of production A.I. issues.
• Open-source ML Observability options.
• How frequently production models should be retrained.
• How ML Observability relates to discovering model biases against particular demographic groups.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Open-Source “Responsible A.I.” Tools, with Ruth Yakubu
In today's episode, Ruth Yakubu details what Responsible A.I. is and open-source options for ensuring we deploy A.I. models — particularly the Generative variety that are rapidly transforming industries — responsibly.
Ruth:
• Has been a cloud expert at Microsoft for nearly seven years; for the past two, she’s been a Principal Cloud Advocate that specializes in A.I.
• Previously worked as a software engineer and manager at Accenture.
• Has been a featured speaker at major global conferences like Websummit.
• Studied computer science at the University of Minnesota.
In this episode, Ruth details:
• The six principles that underlie whether a given A.I. model is responsible or not.
• The open-source Responsible A.I. Toolbox that allows you to quickly assess how your model fares across a broad range of Responsible A.I. metrics.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tools for Building Real-Time Machine Learning Applications, with Richmond Alake
Today, the astonishingly industrious ML Architect and entrepreneur Richmond Alake crisply describes how to rapidly develop robust and scalable Real-Time Machine Learning applications.
Richmond:
• Is a Machine Learning Architect at Slalom Build, a huge Seattle-based consultancy that builds products embedded with analytics and ML.
• Is Co-Founder of two startups: one uses computer vision to correct peoples’ form in the gym and the other is a generative A.I. startup that works with human speech.
• Creates/delivers courses for O'Reilly and writes for NVIDIA.
• Previously worked as a Computer Vision Engineer and as a Software Developer.
• Holds a Masters in Computer Vision, ML and Robotics from the University of Surrey.
Today’s episode will appeal most to technical practitioners, particularly those who incorporate ML into real-time applications, but there’s a lot in this episode for anyone who’d like to hear about the latest tools for developing real-time ML applications from a leader in the field.
In this episode, Richmond details:
• The software choices he’s made up and down the application stack — from databases to ML to the front-end — across his startups and the consulting work he does.
• The most valuable real-time ML tools he teaches in his courses.
• Why writing for the public is an invaluable career hack that everyone should be taking advantage of.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
Today, the wildly intelligent Dr. Matar Haller introduces Contextual A.I. (which considers adjacent, often multimodal information when making inferences) as well as how to use ML to build moat around your company.
Matar:
• Is VP of Data and A.I. at ActiveFence, an Israeli firm that has raised over $100m in venture capital to protect online platforms and their users from malicious behavior and malicious content.
• Is renowned for her top-rated presentations at leading conferences.
• Previously worked as Director of Algorithmic A.I. at SparkBeyond, an analytics platform.
• Holds a PhD in neuroscience from the University of California, Berkeley.
• Prior to data science, taught soldiers how to operate tanks.
Today’s episode has some technical moments that will resonate particularly well with hands-on data science practitioners but for the most part the episode will be interesting to anyone who wants to hear from a brilliant person on cutting-edge A.I. applications.
In this episode, Matar details:
• The “database of evil” that ActiveFence has amassed for identifying malicious content.
• Contextual A.I. that considers adjacent (and potentially multimodal) information when classifying data.
• How to continuously adapt A.I. systems to real-world adversarial actors.
• The machine learning model-deployment stack she uses.
• The data she collected directly from human brains and how this research relates to the brain-computer interfaces of the future.
• Why being a preschool teacher is a more intense job than the military.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
XGBoost: The Ultimate Classifier, with Matt Harrison
XGBoost is typically the most powerful ML option whenever you're working with structured data. In today's episode, world-leading XGBoost XPert (😂) Matt Harrison details how it works and how to make the most of it.
Matt:
• Is the author of seven best-selling books on Python and Machine Learning.
• His most recent book, "Effective XGBoost", was published in March.
• Teaches "Exploratory Data Analysis with Python" at Stanford University.
• Through his consultancy MetaSnake, he’s taught Python at leading global organizations like NASA, Netflix, and Qualcomm.
• Previously worked as a CTO and Software Engineer.
• Holds a degree in Computer Science from Stanford.
Today’s episode will appeal primarily to practicing data scientists who are keen to learn about XGBoost or keen to become an even deeper expert on XGBoost by learning about it from a world-leading educator on the library.
In this episode, Matt details:
• Why XGBoost is the go-to library for attaining the highest accuracy when building a classification model.
• Modeling situations where XGBoost should not be your first choice.
• The XGBoost hyperparameters to adjust to squeeze every bit of juice out of your tabular training data and his recommended library for automating hyperparameter selection.
• His top Python libraries for other XGBoost-related tasks such as data preprocessing, visualizing model performance, and model explainability.
• Languages beyond Python that have convenient wrappers for applying XGBoost.
• Best practices for communicating XGBoost results to non-technical stakeholders.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The A.I. and Machine Learning Landscape, with investor George Mathew
Today, razor-sharp investor George Mathew (of Insight Partners, which has a whopping $100-billion AUM 😮) brings us up to speed on the Machine Learning landscape, with a particular focus on Generative A.I. trends.
George:
• Is a Managing Director at Insight Partners, an enormous New York-based venture capital and growth equity firm ($100B in assets under management) that has invested in the likes of Twitter, Shopify, and Monday.com.
• Specializes in investing in A.I., ML and data "scale-ups" such as the enterprise database company Databricks, the fast-growing generative A.I. company Jasper, and the popular MLOps platform Weights & Biases.
• Prior to becoming an investor, was a deep operator at fast-growing companies such as Salesforce, SAP, the analytics automation platform Alteryx (where he was President & COO) and the drone-based aerial intelligence platform Kespry (where he was CEO & Chairman).
Today’s episode will appeal to technical and non-technical listeners alike — anyone who’d like to be brought up to speed on the current state of the data and machine learning landscape by a razor-sharp expert on the topic.
In this episode, George details:
• How sensational generative A.I. models like GPT-4 are bringing about a deluge of opportunity for domain-specific tools and platforms.
• The four layers of the "Generative A.I. Stack" that supports this enormous deluge of new applications.
• How RLHF — reinforcement learning from human feedback — provides an opportunity for you to build your own powerful and defensible models with your proprietary data.
• The new LLMOps field that has emerged to support the suddenly ubiquitous LLMs (Large Language Models), including generative models.
• How investment criteria differ depending on whether the prospective investment is seed stage, venture-capital stage, or growth stage.
• The flywheel that enables the best software companies to scale extremely rapidly.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
StableLM: Open-source “ChatGPT”-like LLMs you can fit on one GPU
Known for their widely popular text-to-image generators like Stable Diffusion, the company's recent release of the first models from their open-source suite of StableLM language models marks a significant advancement in the AI domain.
Read MoreDigital Analytics with Avinash Kaushik
Today's guest is an icon, a bestselling author and world-leading authority on digital analytics. In this interview, Avinash Kaushik masterfully describes how A.I. is transforming analytics and how you can capitalize to deliver joy to your customers.
Avinash:
• Is Chief Strategy Officer at Croud, a leading marketing agency.
• Was until recently Sr. Director of Global Strategic Analytics at Google, where he spent 16 years and where he launched the ubiquitous Google Analytics tool.
• Is a multi-time author, including the industry-standard book "Web Analytics 2.0".
• Is an authority on marketing analytics through his widely-read "Occam's Razor" blog and "The Marketing Analytics Intersect" newsletter (55k subscribers).
• His prodigious posting of useful analytics insights has landed him 200k Twitter followers and 300k followers on LinkedIn.
Today’s episode has a few deeply technical moments but for the most part is accessible to anyone who’d like to glean practical digital analytics insights from a world leader in the space.
In this episode, Avinash details:
• The distinction between brand analytics and performance analytics, and why both are critical for commercial success.
• His “four clusters of intent” for understanding your audience, delivering joy to them, and accelerating business profit.
• Why it’s a superpower for executives to be hands-on with data tools and programming.
• His favorite data tools and programming languages.
• How A.I. is transforming analytics today and his concrete vision for how A.I. will transform analytics in the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Pandas for Data Analysis and Visualization
Today's episode is jam-packed with practical tips on using the Pandas library in Python for data analysis and visualization. Super-sharp Stefanie Molin — a bestselling author and sought-after instructor on these topics — is our guide.
Stefanie:
• Is the author of the bestselling book "Hands-On Data Analysis with Pandas".
• Provides hands-on pandas and data viz tutorials at top industry conferences.
• Is a software engineer and data scientist at Bloomberg, the financial data giant, where she tackles problems revolving around data wrangling/visualization and building tools for gathering data.
• Holds a degree in operations research from Columbia University as well as a masters in computer science, with an ML specialization, from Georgia Tech.
Today’s episode is intended primarily for hands-on practitioners like data analysts, data scientists, and ML engineers — or anyone that would like to be in a technical data role like these in the future.
In this episode, Stefanie details:
• Her top tips for wrangling data in pandas.
• In what data viz circumstances you should use pandas, matplotlib, or Seaborn.
• Why everyone who codes, including data scientists, should develop expertise in Python package creation as well as contribute to open-source projects.
• The tech stack she uses in her role at Bloomberg.
• The productivity tips she honed by simultaneously working full-time, completing a masters degree and writing a bestselling book.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)
Large Language Models (LLMs) are capable of extraordinary NLP feats, but are so large that they're too expensive for most organizations to train. The solution is Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA).
This discussion comes in the wake of introducing models like Alpaca, Vicuña, GPT4All-J, and Dolly 2.0, which demonstrated the power of fine-tuning with thousands of instruction-response pairs.
Training LLMs, even those with tens of billions of parameters, can be prohibitively expensive and technically challenging. One significant issue is "catastrophic forgetting," where a model, after being retrained on new data, loses its ability to perform previously learned tasks. This challenge necessitates a more efficient approach to fine-tuning.
PEFT
By reducing the memory footprint and the number of parameters needed for training, PEFT methods like LoRA and AdaLoRA make it feasible to fine-tune large models on standard hardware. These techniques are not only space-efficient, with model weights requiring only megabytes of space, but they also avoid catastrophic forgetting, perform better with small data sets, and generalize better to out-of-training-set instructions. They can also be applied to other A.I. use cases — not just NLP — such as machine vision.
LoRA
LoRA stands out as a particularly effective PEFT method. It involves inserting low-rank decomposition matrices into each layer of a transformer model. These matrices represent data in a lower-dimensional space, simplifying computational processing. The key to LoRA's efficiency is freezing all original model weights except for the new low-rank matrices. This strategy reduces the number of trainable parameters by approximately 10,000 times and lowers the memory requirement for training by about three times. Remarkably, LoRA sometimes not only matches but even outperforms full-model training in certain scenarios. This efficiency does not come at the cost of effectiveness, making LoRA an attractive option for fine-tuning LLMs.
AdaLoRA
AdaLoRA, a recent innovation by researchers at Georgia Tech, Princeton, and Microsoft, builds on the foundations of LoRA. It differs by adaptively fine-tuning parts of the transformer architecture that benefit most from it, potentially offering enhanced performance over standard LoRA.
These developments in PEFT and the emergence of tools like LoRA and AdaLoRA mark an incredibly exciting and promising time for data scientists. With the ability to fine-tune large models efficiently, the potential for innovation and application in the field of AI is vast and continually expanding.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Taipy, the open-source Python application builder
An A.I. expert for nearly 40 years, Vincent Gosselin adores the field's lingua franca, Python. In today's episode, hear how he created its open-source Taipy library so you can easily build Python-based web apps and scalable, reusable data pipelines.
Vincent:
• Is CEO and co-founder of taipy.io, an open-source Python library that works up and down the stack to both easily build web applications and back-end data pipelines.
• Having obtained his Masters in CS and A.I. from the Université Paris-Saclay in 1987, he’s amassed a wealth of experience across a broad range of industries, including semiconductors, finance, airspace, and logistics.
• Has held roles including Director of Software Development at ILOG, Director of Advanced Analytics at IBM, and VP of Advanced Analytics at DecisionBrain.
Today’s episode will appeal primarily to hands-on practitioners who are keen to hear about how they can be accelerating their productivity in Python, whether it’s on the front end (to build a data-driven web-application) or on the back end (to have scalable, reusable and maintainable data pipelines). That said, Vincent’s breadth of wisdom — honed over his decades-long A.I. career — may prove to be fascinating and informative to technical and non-technical listeners alike.
In this episode, Vincent details:
• The critical gaps in Python development that led him to create Taipy.
• How much potential there is for data-pipeline engineering to be improved.
• How shifting toward lower-code environments can accelerate Python development without sacrificing any flexibility.
• The 50-year-old programming language that was designed for A.I. and that he was nostalgic for until Python emerged on the scene.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Open-source “ChatGPT”: Alpaca, Vicuña, GPT4All-J, and Dolly 2.0
Want a GPT-4-style model on your own hardware and fine-tuned to your proprietary language-generation tasks? Today's episode covers the key open-source models (Alpaca, Vicuña, GPT4All-J, and Dolly 2.0) for doing this cheaply on a single GPU 🤯
We begin with a retrospective look at Meta AI's LLaMA model, which was introduced in episode #670. LLaMA, with its 13 billion parameters, achieves performance comparable to GPT-3 while being significantly smaller and more manageable. This efficiency makes it possible to train the model on a single GPU, democratizing access to advanced AI capabilities.
The focus then shifts to four models that surpass LLaMA in terms of power and sophistication: Alpaca, Vicuña, GPT4All-J, and Dolly 2.0. Each of these models presents a unique blend of innovation and practicality, pushing the boundaries of what's possible with AI:
Alpaca
Developed by Stanford researchers, Alpaca is an evolution of the 7 billion parameter LLaMA model, fine-tuned with 52,000 examples of instruction-following natural language. This model excels in mimicking GPT-3.5's instruction-following capabilities, offering high performance at a fraction of the cost and size.
Vicuña
Vicuña, a product of collaborative research across multiple institutions, builds on both the 7 billion and 13 billion parameter LLaMA models. It's fine-tuned on 70,000 user-shared ChatGPT conversations from the ShareGPT repository, achieving GPT-3.5-like performance with unique user-generated content.
GPT4All-J
GPT4All-J, released by Nomic AI, is based on EleutherAI's open source 6 billion parameter GPT-J model. It's fine-tuned with an extensive 800,000 instruction-response dataset, making it an attractive option for commercial applications due to its open-source nature and Apache license.
Dolly 2.0
Dolly 2.0, from database giant Databricks, builds upon EleutherAI's 12 billion parameter model. It's fine-tuned with 15,000 human-generated instruction response pairs, offering another open source, commercially viable option for AI applications.
These models represent a significant shift in the AI landscape, making it economically feasible for individuals and small teams to train and deploy powerful language models. With a few hundred to a few thousand dollars, it's now possible to create proprietary, ChatGPT-like models tailored to specific use cases.
The advancements in AI models that can be trained on a single GPU mark a thrilling era in data science. These developments not only showcase the rapid progression of AI technology but also significantly lower the barrier to entry, allowing a broader range of users to explore and innovate in the field of artificial intelligence.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Cloud Machine Learning
As ML models, particularly LLMs, have scaled up to having trillions of trainable parameters, cloud compute platforms have never been more essential. In today's episode, Hadelin and Kirill cover how data scientists can make the most of the cloud.
Kirill:
• Is Founder and CEO of SuperDataScience, an e-learning platform.
• Founded the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins in late 2020.
Hadelin:
• Was a data engineer at Google before becoming a content creator.
• Took a break from Data Science content in 2020 to produce and star on Bollywood.
Together, Kirill and Hadelin:
• Are the most popular data science instructors on the Udemy platform, with over two million students.
• Have created dozens of data science courses.
• Recently returned from a multi-year course-creation hiatus to publish their “Machine Learning in Python: Level 1" course as well as their brand-new course on cloud computing.
Today’s episode is all about the latter so will appeal primarily to hands-on practitioners like data scientists who are keen to be introduced to — or brush up upon — analytics and ML in the cloud.
In this episode, Kirill and Hadelin detail:
• What cloud computing is.
• Why data scientists increasingly need to know how to use the key cloud computing platforms such as AWS, Azure, and the Google Cloud Platform.
• The key services the most popular cloud platform AWS offers, particularly with respect to databases and machine learning.
*Note that it is a coincidence that AWS sponsored this show with a promotional message about their hardware accelerators. Kirill and Hadelin did not receive any compensation for developing content on AWS nor for covering AWS topics in this episode.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
LLaMA: GPT-3 performance, 10x smaller
By training (relatively) small LLMs for (much) longer, Meta AI's LLaMA architectures achieve GPT-3-like outputs at as little as a thirteenth of GPT-3's size. This means cost savings and much faster execution time.
LLaMA, a clever nod to LLMs (Large Language Models), is Meta AI's latest contribution to the AI world. Based on the Chinchilla scaling laws, LLaMA adopts a principle that veers away from the norm. Unlike its predecessors, which boasted hundreds of millions of parameters, LLaMA emphasizes training smaller models for longer durations to achieve enhanced performance.
The Chinchilla Principle in LLaMA
The Chinchilla scaling laws, introduced by Hoffmann and colleagues, postulate that extended training of smaller models can lead to superior performance. LLaMA, with its 7 billion to 65 billion parameter models, is a testament to this principle. For perspective, GPT-3 has 175 billion parameters, making the smallest LLaMA model just a fraction of its size.
Training Longer for Greater Performance
Meta AI's LLaMA pushes the boundaries by training these relatively smaller models for significantly longer periods than conventional approaches. This method contrasts with last year's top models like Chinchilla, GPT-3, and PaLM, which relied on undisclosed training data. LLaMA, however, uses entirely open-source data, including datasets like English Common Crawl, C4, GitHub, Wikipedia, and others, adding to its appeal and accessibility.
LLaMA's Remarkable Achievements
LLaMA's achievements are notable. The 13 billion parameter model (LLaMA 13B) outperforms GPT-3 in most benchmarks, despite having 13 times fewer parameters. This implies that LLaMA 13 can offer GPT-3 like performance on a single GPU. The largest LLaMA model, 65B, competes with giants like Chinchilla 70B and PaLM, even preceding the release of GPT-4.
This approach signifies a shift in the AI paradigm – achieving state-of-the-art performance without the need for enormous models. It's a leap forward in making advanced AI more accessible and environmentally friendly. The model weights, though intended for researchers, have been leaked and are available for non-commercial use, further democratizing access to cutting-edge AI.
LLaMA not only establishes a new benchmark in AI efficiency but also sets the stage for future innovations. Building on LLaMA's foundation, models like Alpaca, Vicuna, and GPT4ALL have emerged, fine-tuned on thoughtful datasets to exceed even LLaMA's performance. These developments herald a new era in AI, where size doesn't always equate to capability.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.