Description
Sparsity, like quantization, is an approximate model optimization technique, where we trade some model accuracy for increased performance.
In this talk we'll explore how to minimize the accuracy degradation of sparsifying Vision Transformer (ViT) based models to GPU accelerable sparsity patterns like block sparsity and semi-structured sparsity.
We'll cover the best techniques to ensure a less-than 5% loss in accuracy when: - training a sparse model from scratch - pruning and retraining an existing dense model - zero-shot/one-shot pruning a dense model
We've collected these techniques into a single repository, torchao, so that model optimization enthusiasts like you can sparsify your models with just a few lines of code.