Contribute Media
A thank you to everyone who has made this possible: Read More

Bot or Not

Description

Twitter makes money by selling ads, but they’ve got an insidious infestation eroding their advertising credibility: bots. These bots are automatons living in the Twittersphere, ranging wildly in capability. In Bot or Not I’ll discuss how to identify bots with a classification algorithm created in scikit-learn and provide some tips on how to account for them when analyzing social media experiments.

Like many Internet giants Twitter makes money by selling ads, but they’ve got an insidious infestation eroding their advertising credibility: bots. More than 23 million of them. Twitter bots are automatons living in the Twittersphere and ranging wildly in capability. In their simplest form, they follow you maybe fav-ing or retweeting your statuses. At their most complex, they troll and ironically, troll trolls using speech patterns that can, at times, fool humans. But when advertisers pay for engagement, they aren’t interested in a four-hour flame war between a gamergate bot and a Kanye bot. When advertisers analyze social data they want to be sure their findings are the result of human activity. In Bot or Not I’ll discuss the taxonomy of Twitter bots, segmenting them based on “physical features” such as profile configuration, and on behavioral features: tweeting, retweeting and fav-ing. We’ll also see how to identify bots with a classification algorithm created in scikit-learn. Finally, I’ll provide some tips to advertisers on how to account for bot behavior when analyzing and interpreting results from social media experiments.

I’ll outline my experimental design, including the process of buying bots to create the training set. Technical details I’ll discuss include using the python-twitter library to connect to the Twitter API and retrieve data, development of a bot taxonomy, and subsequent classification algorithm with pandas and scikit-learn.

Slides available here: http://www.slideshare.net/ErinShellman/bot-or-not

Details

Improve this page