The news domain presents many interesting challenges for recommender systems - a continuous cold start problem to recommend newly published articles, learning from the implicit feedback of user clicks, a typically high amount of user traffic, that nevertheless suffers from sparsity. This talk will show how we tackled these challenges to build a content-based recommender system for financial news.
The news domain poses an interesting challenge for recommender systems - when reading news, users prefer the most recently published articles (i.e. the breaking news), and yet new articles without a reading history suffer from the cold start problem and are more difficult to recommend to users. Other interesting challenges are how to learn from the implicit feedback of user clicks , how to handle the typically high amount of user traffic and transform it into training data , how to deal with sparsity in the data (most users read a small number of articles per day, and a small amount of articles get the majority of user clicks), and what are the right natural language processing tools for Dutch language.
In this talk we discuss how we tackled these challenges to build a content- based recommender system for Het Financieele Dagblad, a daily Dutch newspaper focusing on business and financial news. To represent the content of the article, we implemented a wide array of enrichment techniques (e.g. representing the articles in a word vector space, sentiment analysis), by using libraries such as textpipe and spaCy. But the most meaningful features that we found referred to the overlap between the user profile and the article representation, such as the overlap in article tags between the article and the set of most frequent tags read by the user. The talk will describe how our recommender system was modeled as a gradient-boosting decision tree, and implemented using the xgboost library.