Contribute Media
A thank you to everyone who has made this possible: Read More

Mathematical Optimization for Machine Learning


In this talk we provide a user-friendly introduction to mathematical optimization for machine learning by essentially answering three important questions: (i) what is mathematical optimization, (ii) why should a machine learning researcher/practitioner learn it, and (iii) how does it actually work?

Every machine learning problem has parameters that must be tuned properly to ensure optimal learning. As a simple example consider the case of linear regression with one dimensional input, where the two parameters slope and intercept of the linear model are tuned by forming a 'cost function' - a continuous function in both parameters - that measures how well the linear model fits a dataset given a value for its slope and intercept. The proper tuning of these parameters via the cost function corresponds geometrically to finding the values for the parameters that make the cost function as small as possible or, in other words, ’minimize’ the cost function. The tuning of these parameters is accomplished by a set of tools known collectively as mathematical optimization.

Mathematical optimization, as the formal study of how to properly minimize cost functions, is used not only in solving virtually every machine learning problem (regression, classification, clustering, etc.), but reasons in a variety of other fields including operations, logistics, and physics. As a result, a mere working knowledge of how to use existing pre-packaged solvers will not be adequate for any serious machine learning developer who wants to code-up their own implementation or tailor existing algorithms to a specific application.

The lion’s share of this talk is dedicated to showing how to implement widely-used optimization schemes in Python. We plan to do so by introducing the concept of iterative methods and presenting two extremely popular iterative schemes: gradient descent and Newton’s method. This will be followed by a discussion of stochastic gradient descent – a variant of gradient descent often referred to as the Backpropagation algorithm, most suitable for today’s large datasets. Live Python demos will be run for all algorithms discussed here.

This talk is based on a forthcoming machine learning textbook (Machine Learning Refined; Cambridge University Press, 2016) co-authored by the speakers: Reza Borhani and Jeremy Watt (PhD, Computer Science, Northwestern University). This text has also been the source for a number of quarter length university courses on machine learning, deep learning, and numerical optimization for graduate and senior level undergraduate students. The speakers have also given/plan to give a number of tutorials on deep learning at major computer vision and AI conferences including CVPR, AAAI, ICIP, WACV, and more.


Improve this page