On Twitter Bots Behaving Badly:

A Manual and Automated Analysis of Python Code Patterns on GitHub

Authors

  • Florian Daniel Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy https://orcid.org/0000-0003-3004-8702
  • Andrea Millimaggi Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy

DOI:

https://doi.org/10.13052/jwe1540-9589.1883

Keywords:

Bots, Harm, Abuse, Code patterns, Pattern recognition, GitHub, Twitter, Python

Abstract

Bots, i.e., algorithmically driven entities that behave like humans in on-line communications, are increasingly infiltrating social conversations on the Web. If not properly prevented, this presence of bots may cause harm to the humans they interact with. This article aims to understand which types of abuse may lead to harm and whether these can be considered intentional or not. We manually review a dataset of 60 Twitter bot code repositories on GitHub, derive a set of potentially abusive actions, characterize them using a taxonomy of abstract code patterns, and assess the potential abusiveness of the patterns. The article then describes the design and implementation of a code pattern recognizer and uses the pattern recognizer to automatically analyze a dataset of 786 Python bot code repositories. The study does not only reveal the existence of 28 communication-specific code patterns - which could be used to assess the harmfulness of bot code - but also their consistent presence throughout all studied repositories.

Downloads

Download data is not yet available.

Author Biographies

Florian Daniel, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy

Florian Daniel is an associate professor at the Dipartimento di Elettronica, Informazione e Bioingegneria of Politecnico di Milano since January 2016, where he currently teaches the foundations of programming to Management Engineering students. He is expected to obtain his tenure as associate professor in January 2019. Before, he held post-doc/research fellow positions in Politecnico di Milano (2007-2008) and University of Trento (2008-2005). He worked as visiting researcher in HP Labs, Palo Alto, California (2006), and the University of New South Wales, Sydney, Australia (2013, 2015, 2017), and as visiting professor in the Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil (2015).

Andrea Millimaggi, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy

Florian Daniel is an Associate Professor with Politecnico di Milano, Italy. His research interests include bots/chatbots, social data analysis and knowledge extraction, service-oriented computing, business process management, and blockchain. He received the Ph.D. degree in information technology from Politecnico di Milano.

References

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec:

Learning distributed representations of code. Proceedings of the ACM

on Programming Languages, 3(POPL):40, 2019.

DavidMBeskow and KathleenMCarley. Bot-hunter: A tiered approach

to detecting & characterizing automated activity on twitter. In SBPBRiMS

, 2018.

Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us

presidential election online discussion. First Monday, 21(11), 2016.

Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro.

Aiding the detection of fake accounts in large scale social online services.

In Proceedings of the 9th USENIX conference on Networked

Systems Design and Implementation, pages 15–15, 2012.

Zi Chu, Steven Gianvecchio, HainingWang, and Sushil Jajodia. Detecting

automation of twitter accounts: Are you a human, bot, or cyborg?

IEEE Transactions on Dependable and Secure Computing, 9(6):811–

, 2012.

Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi,

and Maurizio Tesconi. Fame for sale: efficient detection of fake

twitter followers. Decision Support Systems, 80:56–71, 2015.

Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini,

and Filippo Menczer. Botornot: A system to evaluate social bots. In

WWW 2016, pages 273–274, 2016.

Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and

Alessandro Flammini. The rise of social bots. Communications of the

ACM, 59(7):96–104, 2016.

Tony Hey, Stewart Tansley, Kristin M Tolle, et al. The fourth paradigm:

data-intensive scientific discovery, volume 1. Microsoft research

Redmond, WA, 2009.

hidden.

Barbara Kitchenham. Procedures for performing systematic reviews.

Keele, UK, Keele University, 33(2004):1–26, 2004.

Kyumin Lee, Brian David Eoff, and James Caverlee. Seven months with

the devils: A long-term study of content polluters on twitter. In ICWSM,

pages 185–192, 2011.

Santanu Paul and Atul Prakash. A framework for source code search

using program patterns. IEEE Transactions on Software Engineering,

(6):463–475, 1994.

Jacob Ratkiewicz, Michael Conover, Mark R Meiss, Bruno Gonçalves,

Alessandro Flammini, and Filippo Menczer. Detecting and tracking

political abuse in social media. In ICWSM, pages 297–304, 2011.

The Parliament of New Zealand. Harmful Digital Communications Act

Public Act 2015 No 63, 2015.

Grigorios Tsoumakas and Ioannis Vlahavas. Random k-labelsets: An

ensemble method for multilabel classification. In European conference

on machine learning, pages 406–417. Springer, 2007.

Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and

Alessandro Flammini. Online human-bot interactions: Detection, estimation,

and characterization. arXiv preprint arXiv:1703.03107, 2017.

Published

2020-01-23

How to Cite

Daniel, F., & Millimaggi, A. (2020). On Twitter Bots Behaving Badly:: A Manual and Automated Analysis of Python Code Patterns on GitHub. Journal of Web Engineering, 18(8), 801–836. https://doi.org/10.13052/jwe1540-9589.1883

Issue

Section

ICWE2019