Filtering by Tag: GPU

Lossless LLM Weight Compression: Run Huge Models on a Single GPU

Added on June 30, 2023 by Jon Krohn.

Many recent episodes have been focused on open-source Large Language Models that you can download and fine-tune to particular use cases depending on your needs or your users’ needs. I’ve particularly been highlighting LLMs with seven billion up to 13 billion model parameters because this size of model can typically be run on a single consumer GPU so it’s relatively manageable and affordable both to train and have in production.

A.I. Accelerators: Hardware Specialized for Deep Learning

Added on June 27, 2023 by Jon Krohn.

Today we’ve got an episode dedicated to the hardware we use to train and run A.I. models (particularly LLMs) such as GPUs, TPUs and AWS's Trainium and Inferentia chips. Ron Diamant may be the best guest on earth for this fascinating topic.

Ron:
• Works at Amazon Web Services (AWS) where he is Chief Architect for their A.I. Accelerator chips, which are designed specifically for training (and making inferences with) deep learning models.
• Holds over 200 patents across a broad range of processing hardware, including security chips, compilers and, of course, A.I. accelerators.
• Has been at AWS for nearly nine years – since the acquisition of the Israeli hardware company Annapurna Labs, where he served as an engineer and project manager.
• Holds a Masters in Electrical Engineering from Technion, the Israel Institute of Technology.

Today’s episode is on the technical side but doesn’t assume any particular hardware expertise. It’s primarily targeted at people who train or deploy machine learning models but might be accessible to a broader range of listeners who are curious about how computer hardware works.

In the episode, Ron details:
• CPUs versus GPUs.
• GPUs versus specialized A.I. Accelerators such as Tensor Processing Units (TPUs) and his own Trainium and Inferentia chips.
• The “AI Flywheel” effect between ML applications and hardware innovations.
• The complex tradeoffs he has to consider when embarking upon a multi-year chip-design project.
• When we get to Large Language Model-scale models with billions of parameters, the various ways we can split up training and inference over our available devices.
• How to get popular ML libraries like PyTorch and TensorFlow to interact optimally with A.I. accelerator chips.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

StableLM: Open-source “ChatGPT”-like LLMs you can fit on one GPU

Added on May 12, 2023 by Jon Krohn.

Known for their widely popular text-to-image generators like Stable Diffusion, the company's recent release of the first models from their open-source suite of StableLM language models marks a significant advancement in the AI domain.

Co-Hosted SuperDataScience Podcast on 2020's Biggest Breakthroughs

Added on January 1, 2021 by Jon Krohn.

Alongside Kirill Eremenko, Founder and CEO of SuperDataScience, I co-hosted this podcast episode on 2020’s biggest machine learning breakthroughs including:

AlphaFold 2
GPT-3
The latest GPUs

We also announced an exciting upcoming podcast-host transition for 2021! Further details to come on air during the next podcast episode.