Description
Machine Learning research workflows are often bottlenecked by the development of compute kernels for new algorithms and GPU architectures. This process can be daunting, and often requires a careful trade-off between productivity and performance. In this talk, we will discuss how Triton -- a mid-level programming language for kernel development -- approaches this multi-objective optimization problem, and the design decisions that were made to that effect.