Contribute Media
A thank you to everyone who makes this possible: Read More

Scikit-learn to "learn them all"

Summary

Scikit-learn is a powerful library, providing implementations for many of the most popular machine learning algorithms. This talk will provide an overview of the "batteries" included in Scikit-learn, along with working code examples and internal insights, in order to get the best for our machine learning code.

Description

Machine Learning is about using the right features, to build the right models, to achieve the right tasks [Flach, 2012] However, to come up with a definition of what actually means right for the problem at the hand, it is required to analyse huge amounts of data, and to evaluate the performance of different algorithms on these data.

However, deriving a working machine learning solution for a given problem is far from being a waterfall process. It is an iterative process where continuous refinements are required for the data to be used (i.e., the right features), and the algorithms to apply (i.e., the right models).

In this scenario, Python has been found very useful for practitioners and researchers: its high-level nature, in combination with available tools and libraries, allows to rapidly implement working machine learning code without reinventing the wheel.

**Scikit-learn** is an actively developing Python library, built on top of the solid numpy and scipy packages.

Scikit-learn (sklearn) is an all-in-one software solution, providing implementations for several machine learning methods, along with datasets and (performance) evaluation algorithms.

These "batteries" included in the library, in combination with a nice and intuitive software API, have made scikit-learn to become one of the most popular Python package to write machine learning code.

In this talk, a general overview of scikit-learn will be presented, along with brief explanations of the techniques provided out-of-the-box by the library.

These explanations will be supported by working code examples, and insights on algorithms' implementations aimed at providing hints on how to extend the library code.

Moreover, advantages and limitations of the sklearn package will be discussed according to other existing machine learning Python libraries (e.g., `shogun <http://shogun-toolbox.org>`__, `pyML <http://pyml.sourceforge.net>`__, `mlpy <http://mlpy.sourceforge.net>`__).

In conclusion, (examples of) applications of scikit-learn to big data and computational intensive tasks will be also presented.

The general outline of the talk is reported as follows (the order of the topics may vary):

  • Intro to Machine Learning
    • Machine Learning in Python
    • Intro to Scikit-Learn
  • Overview of Scikit-Learn
    • Comparison with other existing ML Python libraries
  • Supervised Learning with sklearn
    • Text Classification with SVM and Kernel Methods
  • Unsupervised Learning with sklearn
    • Partitional and Model-based Clustering (i.e., k-means and Mixture Models)
  • Scaling up Machine Learning
    • Parallel and Large Scale ML with sklearn

The talk is intended for an intermediate level audience (i.e., Advanced). It requires basic math skills and a good knowledge of the Python language.

Good knowledge of the numpy and scipy packages is also a plus.

Improve this page