### Description

We’ve all heard terms like Bayes error, perceptron learning theorem, the fundamental theorem of statistical learning, VC dimension, etc. This talk is about using the math-heavy fundamentals of machine learning to understand the very solvability of classification problems. By the end of the talk, you will get a clear picture of how these ideas can be practically applied to classification problems.

Why does a classifier not fit? This can only happen for two reasons:

- Because the model is not smart enough, or
- Because the training data itself is not “classifiable”.

Unfortunately, the only obvious way to determine the *classifiability*
or *separability* of a training dataset is to use a variety of
classification models with a variety of hyperparameters. In other words,
separability of classes in a dataset is usually expressed only in terms
of which model worked on that dataset.

Unfortunately, this does not answer the fundamental question of whether
a dataset is classifiable or not. If we keep on increasing the
complexity of models and trying them out on a dataset without success,
all we can infer from this is that the set of models we have tried out
*so far* are incapable of learning the classification problem. It does
not necessarily mean that the problem is unsolvable.

Fortunately, many shallow learning models have been widely studied and
are well understood. As such, it is quite possible to place theoretical
bounds on their performance in the context of a dataset. There are a
variety of statistics that we can use *a priori* to determine the
likelihood of a model fitting a dataset.

This talk is about how we can use these results towards developing a strategy, a structured approach for carrying out machine learning experiments, instead of blindly running models and hoping that one of them works. Starting from elementary results like Bayes theorem and the perceptron learning rule all the way up to complex ideas like kernel methods and VC dimension, this talk develops a framework for the analysis of data in the context of separability of classes.

While the talk might sound theoretical, major focus will be on how to
make practical, hands-on use of these concepts to better understand your
data and your models. By the end of the talk, you will have learnt how
to *prioritize* which models to use on which dataset, and how to compute
the likelihood of them fitting on the data. This rigorous analysis of
models and data saves a lot of effort and money, as the talk will
demonstrate with real-world examples.