Contribute Media
A thank you to everyone who makes this possible: Read More

Gradient Boosting for data with both numerical and text features

Description

Some problems contain different types of data, including numerical, categorical and text data. CatBoost is the first Gradient Boosting library to have text features support out-of-the box. This talk will walk you through main features of CatBoost library and explain how it deals with text data.

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.

Some problems contain different types of data, including numerical, categorical and text data. In this case the best solution is either building new numerical features instead of text and categories and pass it to gradient boosting, or using out-of-the box solutions for that.

CatBoost (https://catboost.ai/) is the first Gradient Boosting library to have text features support out-of-the box.

CatBoost is a popular open-source gradient boosting library with a whole set of advantages:

  1. CatBoost is able to incorporate categorical features and text features in your data with no additional preprocessing.
  2. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries.
  3. CatBoost predictions are 20-60 times faster then in other open-source gradient boosting libraries, which makes it possible to use CatBoost for latency-critical tasks.
  4. CatBoost has a variety of tools to analyze your model.

This talk will walk you through main features of this library including the way it works with texts.

Details

Improve this page