Announcing today: The second book in my "Pearson AI Signature Series" is "Becoming an AI Orchestrator" by the inimitable Sadie St Lawrence!
Read MoreFiltering by Tag: LLM
Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta
Mixture-of-experts models? State-space models? Easily running these cutting-edge LLM approaches on any device? In today's episode, deep experts Shirish and Tyler masterfully reveal all.
Shirish Gupta:
Returns to my podcast for the third time this year!
Director of A.I. Product Management at Dell Technologies, where he's been for over 20 years!
Holds a Master's in Engineering from the University of Maryland.
Tyler Cox:
Distinguished Engineer at Dell in the Client Solutions Group CTO.
Leads on-device A.I. innovation programs across a wide range of products, including A.I. PCs, workstations, and edge computing platforms.
Holds a Master's in Software Engineering from The University of Texas at Austin.
Today's episode will be particularly appealing to hands-on practitioners (e.g., data scientists, AI/ML engineers, software developers) as well as anyone who could benefit from considering shifting A.I. workloads from the cloud to local devices.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
AI Catalyst: Enterprise Agent Deployments
In two weeks, I'm hosting a special half-day event in the O'Reilly platform on "Agents in the Enterprise". The speakers — Sadie, Luca and Tony — are sensational, ensuring it'll be informative and fun :)
The full event title is "A.I. Catalyst: Enterprise Agent Deployments" and it'll be held on Tuesday November 18th at 9am PT / noon ET.
Here are the specific speakers/topics:
Lightning AI CTO Luca Antiga will kick the event off by providing hands-on demos on how to deploy enterprise-grade multi-agent systems at scale — fast.
Renowned data science instructor, serial entrepreneur and data-community builder Sadie St Lawrence will cover how executives, managers and practitioners alike can thrive in the new "A.I. era" of work.
Tony Kipkemboi will detail how enterprises can operationalize multi-agent teams with tools like CrewAI.
There will be plenty of time for interactive Q&A with each of the speakers.
This event is produced by the learning-publishing giant Pearson and part of the "A.I. Catalyst" series I've been hosting in the O'Reilly platform for a couple of years now. Thanks to Debra Williams Cauley and Dayna Isley at Pearson in particular for creating and supporting this "A.I. Catalyst" special-event series.
Learn more / register here: learning.oreilly.com/live-events/ai-catalyst-enterprise-agent-deployments/0642572250188/
If you don't have access to the O'Reilly platform already, you can get a free 30-day trial with my code "SDSPOD25".
The Two Types of Agentic Systems
There are two types of agentic systems: workflows (simpler, safer) and agents proper (more capable but tougher to control). Hear more detail on this critical distinction in today's short video.
This clip was taken from a four-hour "Agentic A.I. Engineering" workshop that I delivered with Ed Donner (that has now been watched over 128,000 times on YouTube!)
Over the coming weeks, I'll continue to release more clips from the full workshop that explain additional discrete topics and, like today's video, I'll publish it in full on LinkedIn so you can enjoy it here if you prefer.
My agentic A.I. YouTube playlist that includes the other short clips and the full four-hour workshop.
LLMs Are Delighted to Help Phishing Scams
Reuters recently tested 6 major LLMs (Grok, ChatGPT, Meta AI, Claude, DeepSeek, Gemini) to assess whether they'd create phishing content... with minor prompt adjustments, 4 out of 6 complied — yikes!
THE INVESTIGATION
Reporters from Reuters requested phishing emails targeting elderly people, fake IRS/bank messages, and tactical scam advice.
THE RESULTS
• Despite initial refusals across the board, relatively simple prompt modifications bypassed safety guardrails.
• Grok, for example, generated a fake charity phishing email targeting the elderly with urgency tactics like "Click now to act before it's too late!"
• When tested on 100 California seniors, the A.I.-generated messages successfully persuaded people to click on malicious links, often because messages seemed urgent or familiar.
REAL-WORLD IMPACT
• The FBI reports phishing is the #1 cybercrime in the U.S., with billions of messages sent daily.
• BMO Bank, as one corporate example, currently blocks 150,000-200,000 phishing emails per month targeting employees... a representative says the problem is escalating: "The numbers never go down, they only go up."
• Cybersecurity experts state criminals are already using A.I. for faster, more sophisticated phishing campaigns.
IMPLICATIONS FOR THOSE OF US IN THE AI INDUSTRY
• LLM misuse is an industry-wide challenge affecting all major frontier labs.
• Reveals fundamental tension between making AI "helpful" vs. "harmless", highlighting the need for more robust safety guardrails across AI systems.
KEY TAKEAWAYS
• For A.I. Builders: Keep security implications front and center when developing applications.
• For users: The same LLMs that helps you write emails can help bad actors craft convincing scams... stay vigilant and educate vulnerable populations (e.g., seniors) about A.I.-enhanced phishing threats. They're only going to get more and more compelling and frequent.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
What is an "AI Agent"? Explained in 90 Seconds
What is an "A.I. Agent" anyway? In today's 90-second video, I quickly and concretely explain.
This video clip was taken from a four-hour "Agentic A.I. Engineering" workshop that I delivered with Ed Donner... indeed, the slides I present in this clip were created by Ed!
Over the coming weeks, I'll release more clips from the full workshop that quickly explain additional discrete topics.
See here for a (today still quite short!) agentic A.I. YouTube playlist that includes the full four-hour workshop if you're interested in checking that out.
Use Contrastive Search to get Human-Quality LLM Outputs
Historically, when we deploy a machine learning model into production, the parameters that the model learned during its training on data were the sole driver of the model’s outputs. With the Generative LLMs that have taken the world by storm in the past few years, however, the model parameters alone are not enough to get reliably high-quality outputs. For that, the so-called decoding method that we choose when we deploy our LLM into production is also critical.
Read MoreAI Emits Far Less Carbon Than Humans (Doing the Same Task)
There's been a lot of press about Large Language Models (LLMs), such as those behind ChatGPT, using vast amounts of energy per query. In fact, however, a person doing the same work emits 12x to 45x more carbon from their laptop alone.
Today’s "Five-Minute Friday" episode is a quick one on how “The Carbon Emissions of Writing and Illustrating Are Lower for AI than for Humans”. Everything in today’s episode is based on an ArXiV preprint paper with that title by researchers from UC Irvine, the Massachusetts Institute of Technology and other universities.
For writing a page of text, for example, the authors estimate:
• BLOOM open-source LLM (including training) produces ~1.6g CO2/query.
• OpenAI's GPT-3 (including training) produces ~2.2g CO2/query.
• Laptop usage for 0.8 hours (average time to write page) emits ~27g CO2 (that's 12x GPT-3).
• Desktop for same amount of writing time emits ~72g CO2 (32 x GPT-3).
For creating a digital illustration:
• Midjourney (including training) produces ~1.9g CO2/query.
• DALL-E 2 produces ~2.2g CO2/query.
• Human takes ~3.2 hours for the same work, emitting ~100g CO2 (45 x DALL-E 2) on a laptop or ~280g CO2 (127 x DALL-E 2) on a desktop.
There are complexities here, such as what humans do with their time instead of writing or illustrating; if it’s spent driving, for example, then the net impact would be worse. As someone who’d love to see the world at net negative carbon emissions ASAP through innovations like nuclear fusion and carbon capture, however, I have been getting antsy about how much energy state-of-the-art LLMs use, but this simple article turned that perspective upside down. I’ll continue to use A.I. to augment my work wherever I can... and hopefully get my day done earlier so I can get away from my machine and enjoy some time outdoors.
Hear more detail in today's episode or check out the video version to see figures as well.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
OpenAI’s DALL-E 3, Image Chat and Web Search
Today's episode details three big releases from OpenAI: (1) DALL-E 3 text-to-image model, which "exactly" adheres to your prompt. (2) Image-to-text chat. (3) Real-time web search integrated into ChatGPT (which seems to lag behind Google's Bard).
So, first, DALL-E 3 text-to-image generation:
• Appears to generate images that are on par with Midjourney V5, the current state-of-the-art.
• The big difference is that apparently DALL-E 3 will actually generate images that adhere “exactly” to the text you provide.
• In contrast, the incumbent models in the state of the art typically ignore words or key parts of the description even though the quality is typically stunning.
• This adherence to prompts extends even to language that you’d like to include in the image, which is mega.
• Watch today's YouTube version for examples of all the above.
In addition, using Midjourney is a really bizarre user experience because it's done through Discord where you provide prompts and get results alongside dozens of other people at the same time. DALL-E 3, in contrast, will be within the slick ChatGPT Plus environment, which could completely get rid of the need to develop text-to-image prompt-engineering expertise in order to get great results. Instead, you can simply have an iterative back-and-forth conversation with ChatGPT to produce the image of your dreams.
Next up is image-to-text chat in ChatGPT Plus:
• We've known this was coming for a while.
• Works stunningly well in the test I've done so far.
• Today's YouTube version also shows an example of this.
Finally, real-time web search with Bing is now integrated into ChatGPT Plus:
• In my personal (anecdotal tests), this lagged behind Google's Bard.
• Bard is also free, so if real-time web search is what you're after, there doesn't seem to be a reason to pay for ChatGPT Plus. That said, for state-of-the-art general chat plus now image generation and text-to-image chat (per the above), ChatGPT Plus is well worth the price tag.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Overcoming Adversaries with A.I. for Cybersecurity, with Dr. Dan Shiebler
Recently in Detroit, my hotel randomly had a podcast studio complete with "ON AIR" sign haha. From there, I interviewed the wildly intelligent Dr. Dan Shiebler on how machine learning is used to tackle cybercrime.
Dan:
• As Head of Machine Learning at Abnormal Security, a cybercrime detection firm that has grown to over $100m in annual recurring revenue in just four years, manages a team of over 50 engineers.
• Previously worked at Twitter, first as a Staff ML Engineer and then as an ML Engineering Manager.
• Holds a PhD in A.I. Theory from the University of Oxford and obtained a perfect 4.0 GPA in his Computer Science and Neuroscience joint Bachelor’s from Brown University.
Today’s episode is on the technical side so might appeal most to hands-on practitioners like data scientists and ML engineers, but anyone who’d like to understand the state-of-the-art in cybersecurity should give it a listen.
In this episode, Dan details:
• The machine learning approaches needed to tackle the uniquely adversarial application of cybercrime detection.
• How to carry out real-time ML modeling.
• What his PhD research on Category Theory entailed and how it applies to the real world.
• The major problems facing humanity in the coming decades that he thinks A.I. will be able to help with… and those that he thinks A.I. won’t.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Llama 2, Toolformer and BLOOM: Open-Source LLMs with Meta’s Dr. Thomas Scialom
Thomas Scialom, PhD is behind many of the most popular Generative A.I. projects including Llama 2, the world's top open-source LLM. Today, the Meta A.I. researcher reveals the stories behind Llama 2 and what's in the works for Llama 3.
Thomas:
• Is an A.I. Research Scientist at Meta.
• Is behind some of the world’s best-known Generative A.I. projects including Llama 2, BLOOM, Toolformer and Galactica.
• Is contributing to the development of Artificial General Intelligence (AGI).
• Has lectured at many of the top A.I. labs (e.g., Google, Stanford, MILA).
• Holds a PhD from Sorbonne University, where he specialized in Natural-Language Generation with Reinforcement Learning.
Today’s episode should be equally appealing to hands-on machine learning practitioners as well as folks who may not be hands on but are nevertheless keen to understand the state-of-the-art in A.I. from someone who’s right on the cutting edge of it all.
In this episode, Thomas details:
• Llama 2, today’s top open-source LLM, including what is what like behind the scenes developing it and what we can expect from the eventual Llama 3 and related open-source projects.
• The Toolformer LLM that learns how to use external tools.
• The Galactica science-specific LLM, why it was brought down after a few days, and how it might eventually re-emerge in a new form.
• How RLHF — reinforcement learning from human feedback — shifts the distribution of generative A.I. outputs from approximating the average of human responses to excellent, often superhuman quality.
• How soon he thinks AGI — artificial general intelligence — will be realized and how.
• How to make the most of the Generative A.I. boom as an entrepreneur.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Code Llama
Meta's Llama 2 offered state-of-the-art performance for an "open-source"* LLM... except on tasks involving code. Now Code Llama is here and it magnificently fills that gap by outperforming all other open-source LLMs on coding benchmarks.
Read MoreLangChain: Create LLM Applications Easily in Python
Today's episode is a fun intro to the powerful, versatile LLM-development framework LangChain. In it, Kris Ograbek talks us through how to use LangChain to chat with previous episodes of SuperDataScience! 😎
Kris:
• Is a content creator who specializes in creating LLM-based projects — with Python libraries like LangChain and the Hugging Face Transformers library — and then using the projects to teach these LLM techniques.
• Previously, he worked as a software engineer in Germany.
• He holds a Master’s in Electrical and Electronics Engineering from the Wroclaw University of Science and Technology.
In this episode, Kris details:
• The exceptionally popular LangChain framework for developing LLM applications.
• Specifically, he introduces how LangChain is so powerful by walking us step-by-step through a chatbot he built that interactively answers questions about episodes of the SuperDataScience podcast.
Having been listening to the podcast for years, at the end of the episode Kris flips the script on me and asks me some of the burning questions he has for me — questions that perhaps many other listeners also have wondered the answers to.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
NLP with Transformers, feat. Hugging Face’s Lewis Tunstall
Lewis Tunstall — brilliant author of the bestseller "NLP with Transformers" and an ML Engineer at Hugging Face — today details how to train and deploy your own LLMs, the race for an open-source ChatGPT, and why RLHF leads to better models.
Dr. Tunstall:
• Is an ML Engineer at Hugging Face, one of the most important companies in data science today because they provide much of the most critical infrastructure for A.I. through open-source projects such as their ubiquitous Transformers library, which has a staggering 100,000 stars on GitHub.
• Is a member of Hugging Face’s prestigious research team, where he is currently focused on bringing us closer to having an open-source equivalent of ChatGPT by building tools that support RLHF (reinforcement learning from human feedback) and large-scale model evaluation.
• Authored “Natural Language Processing with Transformers”, an exceptional bestselling book that was published by O'Reilly last year and covers how to train and deploy Large Language Models (LLMs) using open-source libraries.
• Prior to Hugging Face, was an academic at the University of Bern in Switzerland and held data science roles at several Swiss firms.
• Holds a PhD in theoretical and mathematical physics from Adelaide in Australia.
Today’s episode is definitely on the technical side so will likely appeal most to folks like data scientists and ML engineers, but as usual I made an effort to break down the technical concepts Lewis covered so that anyone who’s keen to be aware of the cutting edge in NLP can follow along.
In the episode, Lewis details:
• What transformers are.
• Why transformers have become the default model architecture in NLP in just a few years.
• How to train NLP models when you have few to no labeled data available.
• How to optimize LLMs for speed when deploying them into production.
• How you can optimally leverage the open-source Hugging Face ecosystem, including their Transformers library and their hub for ML models and data.
• How RLHF aligns LLMs with the outputs users would like.
• How open-source efforts could soon meet or surpass the capabilities of commercial LLMs like ChatGPT.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Observing LLMs in Production to Automatically Catch Issues
Today, Amber Roberts and Xander Song provide a technical deep dive into the major challenges (such as drift) that A.I. systems (particularly LLMs) face in production. They also detail solutions, such as open-source ML Observability tools.
Both Amber and Xander work at Arize AI, an ML observability platform that has raised over $60m in venture capital.
Amber:
• Serves as an ML Growth Lead at Arize, where she has also been an ML engineer.
• Prior to Arize, worked as an AI/ML product manager at Splunk and as the head of A.I. at Insight Data Science.
• Holds a Masters in Astrophysics from the Universidad de Chile in South America.
Xander:
• Serves as a developer advocate at Arize, specializing in their open-source projects.
• Prior to Arize, he spent three years as an ML engineer.
• Holds a Bachelors in Mathematics from UC Santa Barbara as well as a BA in Philosophy from the University of California, Berkeley.
Today’s episode will appeal primarily to technical folks like data scientists and ML engineers, but we made an effort to break down technical concepts so that it’s accessible to anyone who’d like to understand the major issues that A.I. systems can develop once they’re in production as well as how to overcome these issues.
In the episode, Amber and Xander detail:
• The kinds of drift that can adversely impact a production A.I. system, with a particular focus on the issues that can affect Large Language Models (LLMs).
• What ML Observability is and how it builds upon ML Monitoring to automate the discovery and resolution of production A.I. issues.
• Open-source ML Observability options.
• How frequently production models should be retrained.
• How ML Observability relates to discovering model biases against particular demographic groups.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
The A.I. and Machine Learning Landscape, with investor George Mathew
Today, razor-sharp investor George Mathew (of Insight Partners, which has a whopping $100-billion AUM 😮) brings us up to speed on the Machine Learning landscape, with a particular focus on Generative A.I. trends.
George:
• Is a Managing Director at Insight Partners, an enormous New York-based venture capital and growth equity firm ($100B in assets under management) that has invested in the likes of Twitter, Shopify, and Monday.com.
• Specializes in investing in A.I., ML and data "scale-ups" such as the enterprise database company Databricks, the fast-growing generative A.I. company Jasper, and the popular MLOps platform Weights & Biases.
• Prior to becoming an investor, was a deep operator at fast-growing companies such as Salesforce, SAP, the analytics automation platform Alteryx (where he was President & COO) and the drone-based aerial intelligence platform Kespry (where he was CEO & Chairman).
Today’s episode will appeal to technical and non-technical listeners alike — anyone who’d like to be brought up to speed on the current state of the data and machine learning landscape by a razor-sharp expert on the topic.
In this episode, George details:
• How sensational generative A.I. models like GPT-4 are bringing about a deluge of opportunity for domain-specific tools and platforms.
• The four layers of the "Generative A.I. Stack" that supports this enormous deluge of new applications.
• How RLHF — reinforcement learning from human feedback — provides an opportunity for you to build your own powerful and defensible models with your proprietary data.
• The new LLMOps field that has emerged to support the suddenly ubiquitous LLMs (Large Language Models), including generative models.
• How investment criteria differ depending on whether the prospective investment is seed stage, venture-capital stage, or growth stage.
• The flywheel that enables the best software companies to scale extremely rapidly.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
StableLM: Open-source “ChatGPT”-like LLMs you can fit on one GPU
Known for their widely popular text-to-image generators like Stable Diffusion, the company's recent release of the first models from their open-source suite of StableLM language models marks a significant advancement in the AI domain.
Read MoreThe Chinchilla Scaling Laws
The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, I cover this ratio and the LLMs that have arisen from it (incl. the new Cerebras-GPT family).
Read MoreParameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)
Large Language Models (LLMs) are capable of extraordinary NLP feats, but are so large that they're too expensive for most organizations to train. The solution is Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA).
This discussion comes in the wake of introducing models like Alpaca, Vicuña, GPT4All-J, and Dolly 2.0, which demonstrated the power of fine-tuning with thousands of instruction-response pairs.
Training LLMs, even those with tens of billions of parameters, can be prohibitively expensive and technically challenging. One significant issue is "catastrophic forgetting," where a model, after being retrained on new data, loses its ability to perform previously learned tasks. This challenge necessitates a more efficient approach to fine-tuning.
PEFT
By reducing the memory footprint and the number of parameters needed for training, PEFT methods like LoRA and AdaLoRA make it feasible to fine-tune large models on standard hardware. These techniques are not only space-efficient, with model weights requiring only megabytes of space, but they also avoid catastrophic forgetting, perform better with small data sets, and generalize better to out-of-training-set instructions. They can also be applied to other A.I. use cases — not just NLP — such as machine vision.
LoRA
LoRA stands out as a particularly effective PEFT method. It involves inserting low-rank decomposition matrices into each layer of a transformer model. These matrices represent data in a lower-dimensional space, simplifying computational processing. The key to LoRA's efficiency is freezing all original model weights except for the new low-rank matrices. This strategy reduces the number of trainable parameters by approximately 10,000 times and lowers the memory requirement for training by about three times. Remarkably, LoRA sometimes not only matches but even outperforms full-model training in certain scenarios. This efficiency does not come at the cost of effectiveness, making LoRA an attractive option for fine-tuning LLMs.
AdaLoRA
AdaLoRA, a recent innovation by researchers at Georgia Tech, Princeton, and Microsoft, builds on the foundations of LoRA. It differs by adaptively fine-tuning parts of the transformer architecture that benefit most from it, potentially offering enhanced performance over standard LoRA.
These developments in PEFT and the emergence of tools like LoRA and AdaLoRA mark an incredibly exciting and promising time for data scientists. With the ability to fine-tune large models efficiently, the potential for innovation and application in the field of AI is vast and continually expanding.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
NLP with GPT Architectures (ChatGPT, GPT-4, and other LLMs)
Large Language Models have revolutionized the field of Natural Language Processing, powering mind blowing tools like ChatGPT and GPT-4. Today, we released the recording of a half-day conference I hosted on the topic.
In partnership with my publisher Pearson, the "A.I. Catalyst" conference was held earlier this month in the O'Reilly Media platform. It has now been cleaned up and released for anyone to view as a standalone three-hour video. In it, we cover the full Large Language Model (LLM) lifecycle from development to deployment.
The presenters are at the absolute vanguard on their topics:
• Sinan Ozdemir: The A.I. entrepreneur and author introduces the theory behind Transformer Architectures and LLMs like BERT, GPT, and T5.
• Melanie Subbiah: A first author on the original GPT-3 paper, Melanie leads interactive demos of the broad range of LLM capabilities.
• Shaan Khosla: A data scientist on my team at Nebula.io, he details practical tips on training, validating, and productionizing LLMs.
If you don't have access to the O'Reilly online platform through your employer or school, you can use my special code "SDSPOD23" to get a 30-day trial and enjoy the video for free!
Check it out here: learning.oreilly.com/videos/catalyst-conference-nlp/9780138224912/