Running State-of-Art Gen AI Models on-Device with NPU Acceleration

YouTube

Description

Since the boom of generative AI, the industry is now moving towards on-device AI inferencing, as it is not only a trend but a necessity now in order to save costs, achieve the best inference performance, ultra-low latency at the lowest power possible. In this session we go over the new features added on the Qualcomm AI Stack and how it works with the public release of ExecuTorch 1.0. We will discuss how to run traditional workloads as well as GenAI use cases including the latest version of Llama on the Mobile device while using Qualcomm Hexagon NPU.

PyVideo

Running State-of-Art Gen AI Models on-Device with NPU Acceleration

Description

Details