At Yolt (ING), we continuously work on our transaction categorization engine to provide app users with a practical financial overview. But how to approach this seemingly clear-cut classification task and what are the key considerations? In this talk, I discuss all aspects, from business requirements to algorithms, starting with a simple model and moving on to label embedding neural networks.
At the advent of the second Payment Services Directive (PSD2) and growing assortment of Personal Finance Management apps, today’s consumers are more than ever expecting accurate categorization of their transactions to get a comprehensive overview of their income and spendings. At Yolt, a fintech venture of ING Bank, we are continuously working to create the best financial transaction categorization engine in the world in order to deliver this customer promise. Interestingly, from a data science perspective, this seemingly straightforward classification task comes with its own intricacies. In this talk, I discuss the ins and outs of this real-life use case, including how evolving business requirements and product design influence modeling decisions. In particular, I highlight the trade-off between personalization and generalization, and both the complications and opportunities of feedback loops. I cover several modeling approaches, ranging from vanilla multiclass classification, to a hyper- personalized learning system and label embedding deep neural network architectures. For the latter, I will explain the Facebook StarSpace architecture and possible extensions, including how to create transaction embeddings from mixed feature types, metric learning concepts, triplet loss functions and potential for one- shot learning. Conceptual machine learning model designs are interspersed with snippets of Python code to provide practical handles. In conclusion, I briefly discuss open challenges and provide key take-aways. After this talk, the audience will be familiar with all aspects of the industry use case of transaction categorization, from problem statement and data, to metrics and modeling approaches.