Summary
Stripe's system for preventing fraudulent payments utilizes a mix of machine learning and data analysis. This talk will describe some technical challenges we’ve faced in building it. In particular, I will discuss how we’ve used (and occasionally written) various Python packages as part of a broader ecosystem to address data processing, feature engineering, and model evaluation problems.
Description
Stripe's system for preventing fraudulent payments utilizes a mix of machine learning and data analysis. Over the last few years, it has evolved from a collection of manually assembled ad-hoc rules to an ensemble of machine- learned models based on historical data from across the entire Stripe network. This talk will describe some of the technical challenges we've faced in building and scaling it. In particular I will discuss how we've used (and occasionally written) various Python packages as part of a broader ecosystem to address data processing, feature engineering, and model evaluation problems.
Some examples:
- We use scikit-learn to train a majority of our models
- We use luigi to manage long-running feature generation jobs and model training scripts
- We use pandas to debug models and features that generate systematic false positives
- We wrote topmodel to evaluate model performance on both production and backtested data