How load balancing AI workloads delivers faster user response times (demo)
Channel:
Subscribers:
296,000
Published on ● Video Link: https://www.youtube.com/watch?v=x_2ehkP1GJk
Join Emanuele Mazza, a Networking Product Specialist at Google Cloud, to learn about how Cloud Load Balancing uses custom metrics to provide queue depth as a metric for load balancing AI workloads to deliver faster user response time to prompts while optimizing TPU and GPU utilization. In this demo, we showcase the gamification of the load balancer where the attendee is competing against our load balancer operating in the region. The console shows how to configure the load balancer and select the least loaded optimal endpoints, ensuring less wait time for GPUs and faster inference response.
Learn more about Cloud Load Balancing here: https://cloud.google.com/load-balancing?e=48754805