If It Weighs the Same as a Duck: Detecting Fraud with Python and Machine Learning

YouTube

Summary

Stripe's system for preventing fraudulent payments utilizes a mix of machine learning and data analysis. This talk will describe some technical challenges we’ve faced in building it. In particular, I will discuss how we’ve used (and occasionally written) various Python packages as part of a broader ecosystem to address data processing, feature engineering, and model evaluation problems.

Description

Stripe's system for preventing fraudulent payments utilizes a mix of machine learning and data analysis. Over the last few years, it has evolved from a collection of manually assembled ad-hoc rules to an ensemble of machine- learned models based on historical data from across the entire Stripe network. This talk will describe some of the technical challenges we've faced in building and scaling it. In particular I will discuss how we've used (and occasionally written) various Python packages as part of a broader ecosystem to address data processing, feature engineering, and model evaluation problems.

Some examples:

We use scikit-learn to train a majority of our models
We use luigi to manage long-running feature generation jobs and model training scripts
We use pandas to debug models and features that generate systematic false positives
We wrote topmodel to evaluate model performance on both production and backtested data

PyVideo