Speed Up Inference with Mixed Precision | AI Model Optimization with Intel® Neural Compressor

Subscribers:
257,000
Published on ● Video Link: https://www.youtube.com/watch?v=H7Gg-EmGpAI



Duration: 4:08
8,208 views
11


Learn the most simple model optimization technique to speed up AI inference. Mixed precision, often used to speed up training, can also be used to speed up inference without having to worry about sacrificing accuracy.

Mixed precision is a popular technique for speeding up training of large AI models. It can also be a simple way to reduce model size and inference latency. This approach mixes lower-precision floating point formats such as FP16 and Bfloat16, together with the original 32-bit floating point parameters. Choosing how to mix formats requires assessing the accuracy effects, knowing what is supported by a given device, and what layers are used.

Intel® Neural Compressor automatically mixes in lower-precision formats supported by the hardware and the model’s layers. This video shows how to get started, whether you’re using PyTorch*, TensorFlow*, or ONNX* Runtime. It also shows how to automatically assess the accuracy effects of lower precisions.

Intel® Neural Compressor: bit.ly/3Nl6pVj

Intel® Neural Compressor GitHub: bit.ly/3NlBgkH

About Intel Software:
Intel® Developer Zone is committed to empowering and assisting software developers in creating applications for Intel hardware and software products. The Intel Software YouTube channel is an excellent resource for those seeking to enhance their knowledge. Our channel provides the latest news, helpful tips, and engaging product demos from Intel and our numerous industry partners. Our videos cover various topics; you can explore them further by following the links.

Connect with Intel Software:
INTEL SOFTWARE WEBSITE: https://intel.ly/2KeP1hD
INTEL SOFTWARE on FACEBOOK: http://bit.ly/2z8MPFF
INTEL SOFTWARE on TWITTER: http://bit.ly/2zahGSn
INTEL SOFTWARE GITHUB: http://bit.ly/2zaih6z
INTEL DEVELOPER ZONE LINKEDIN: http://bit.ly/2z979qs
INTEL DEVELOPER ZONE INSTAGRAM: http://bit.ly/2z9Xsby
INTEL GAME DEV TWITCH: http://bit.ly/2BkNshu

#intelsoftware #ai

Speed Up Inference with Mixed Precision | AI Model Optimization with Intel® Neural Compressor




Other Videos By Intel Software


2023-08-14SYCL ND-Range | Intel Software
2023-08-11Increasing Trust in Confidential Computing with Project Amber | InTechnology Podcast
2023-08-11Social-Technical Systems with Maria Bezaitis | InTechnology Podcast | Intel Software
2023-08-10OpenVINO Demos Overview | Intel Software
2023-08-09Overview of Intel® Optimizations for PyTorch* | Intel Software
2023-07-28Create Custom Layers | Intel® Graphics Performance Analyzers Framework Quick Tips | Intel Software
2023-07-27July 2023 | oneAPI Dev News | Intel Software
2023-07-27July 2023 | oneAPI Dev News | Intel Software
2023-07-26July 2023 | Intel Software
2023-07-26Introduction to Intel's AI Solutions Stack | Intel Software
2023-07-26Speed Up Inference with Mixed Precision | AI Model Optimization with Intel® Neural Compressor
2023-07-25July 2023 | IDZ News | Intel Software
2023-07-24Style-Transfer (Gen AI) with OpenVINO | Intel Software
2023-07-24Create Custom Layers | Intel® Graphics Performance Analyzers Framework Quick Tips | Intel Software
2023-07-24Style-Transfer (Gen AI) with OpenVINO | Intel Software
2023-07-18Unlock Generative AI with Software Powered by oneAPI | Intel Software
2023-07-18Visual Inspection AI Reference Kit | Introduction | Intel Software
2023-07-17Visual Inspection AI Reference Kit | The Full Flow | Intel Software
2023-07-17Visual Inspection AI Reference Kit | Introduction | Intel Software
2023-07-13Hugging Face + OpenVINO | Intel Software
2023-07-12Start Post-Training Static Quantization | AI Model Optimization with Intel® Neural Compressor



Tags:
Intel Developer Zone
IDZ
Intel Software
Software Developer
Developer Tools
Software Tools
Developer
Intel
AI model optimization
deep learning
model compression
model optimization
mixed precision
bfloat16
float16
half precision
Intel Neural Compressor