Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM

Channel:

Google for Developers

Subscribers:

2,510,000

Published on April 2, 2024 4:00:35 AM ● Video Link: https://www.youtube.com/watch?v=y4QljAMsXr0

Duration: 12:21

926 views

Even the smallest of Large Language Models are compute intensive significantly affecting the cost of your Generative AI application. Your ability to increase the throughput and reduce latency can make or break many business cases. NVIDIA TensorRT-LLM is an open-source tool that allows you to considerably speed up execution of your models and in this talk we will demonstrate its application to Gemma.

Watch more videos of Gemma Developer Day 2024 → https://goo.gle/440EAIV
Subscribe to Google for Developers → https://goo.gle/developers

#Gemma #GemmaDeveloperDay

Other Videos By Google for Developers

2024-04-08	Did you write them? 👀
2024-04-05	Google for Games Developer Summit 2024 Recap
2024-04-05	Can the Gemini AI model use APIs? \| Build with Google AI
2024-04-05	Introduction to Passes Classes and Passes Objects
2024-04-04	Help these buttons alert their actual indexes. Go!
2024-04-04	Why use Gemini API to write SQL? \| Build with Google AI
2024-04-03	Use Gemini API for database queries? \| Build with Google AI
2024-04-03	AI Data Agent with Gemini API \| Build with Google AI
2024-04-02	You’re doing amazing, sweetie.
2024-04-02	The types of devs at I/O
2024-04-01	Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM
2024-04-01	Open models in the Gemini era
2024-04-01	Demo: Deploying Gemma at dataflow scale
2024-04-01	Gemma: The responsible way to build
2024-04-01	Demo: Using Gemma with the Hugging Face ecosystem
2024-04-01	Demo: Gemma on-device with MediaPipe and TensorFlow Lite
2024-04-01	A fireside chat with Jeanine Banks and Oriol Vinyals
2024-04-01	Demo: Taking Gemma from prototype to production faster with Vertex AI
2024-04-01	Designing the open and safe AI future
2024-04-01	Getting started with Gemma models
2024-04-01	Demo: Building cloud-native, AI-powered applications with GKE

Tags:

Google

developers

pr_pr: Core DevRel DEI;

Purpose: Learn;

Type: Upload Only;

gds:N/A;

Channel	Latest
Duncan Mors	6 hours ago
GamesVonJames	6 hours ago
Alex Milya	6 hours ago
Cleansound Studio	6 hours ago
LIKETHIS TV	6 hours ago
Never More	6 hours ago
Alan Farias	6 hours ago
희먹	6 hours ago
Tio Mexican	6 hours ago
SNAPJ	6 hours ago
TheOfficial Fuzion	6 hours ago
Fortnite Kid	7 hours ago
PerfectParadox	7 hours ago
gLobbZ	7 hours ago
Yari the Impaler	7 hours ago
Jelly Jungle	7 hours ago
Y2JArmyofficial	7 hours ago
MakeItLook EZ	7 hours ago
MK Gamers	7 hours ago
DNA ON YOUTUBE	7 hours ago
Dragon Blogger Technology and Entertainment	7 hours ago
TheFunnyWeasel1	7 hours ago
Max Steel	7 hours ago
Rayan Zaidi	7 hours ago
Francisco beta77	7 hours ago