"On the internet, fraudulent and abusive behavior is considered especially heinous. At Heroku, the dedicated detectives who investigate these vicious felonies are members of an elite squad armed with large amounts of data and spare CPU cycles. These are their stories."
Bad behavior can wreak havoc on your web application. It might be mass-signups, fraudulent orders, spammy posts, right up to automated bots designed to work around restrictions you have set in place; this can cost you time, resources, and lots of money. All is not lost though. Despite the ongoing efforts of abusers, their activity still leaves fingerprints and clues, which you can use to your advantage.
This talk is a 101 introduction to some of the methods which you can use to separate good from bad users using a combination of data mining, statistics, and some some basic machine learning. Basically, I want to get you thinking like an internet detective.
Some of the topics I will be covering include:
- Collecting and preparing data sources.
- Effective methods for classifying existing users.
- Feature extracting; what works and what doesn't.
- Analyzing user-provided data to profile your users, and weed out the bad operators.
- Determining a user's intentions by looking at their access patterns.
- Making use of 'outliers' to find suspicious users and transactions.
- Stopping bad users before they can wreak havoc.
As this is a 101 topic I will provide some basic examples, as well as links to more in-depth resources for further reading. I would recommend this talk to developers of web applications, especially those with a large number of users, the ability to process credit cards, or with a 'free' offering. Attendees should have a basic understanding of topics such as SQL, Pandas, and some basic understanding of mathematics and statistics, although this is not essential as I will be providing links to further reading.