On Twitter Bots Behaving Badly:
A Manual and Automated Analysis of Python Code Patterns on GitHub
Keywords:Bots, Harm, Abuse, Code patterns, Pattern recognition, GitHub, Twitter, Python
Bots, i.e., algorithmically driven entities that behave like humans in on-line communications, are increasingly infiltrating social conversations on the Web. If not properly prevented, this presence of bots may cause harm to the humans they interact with. This article aims to understand which types of abuse may lead to harm and whether these can be considered intentional or not. We manually review a dataset of 60 Twitter bot code repositories on GitHub, derive a set of potentially abusive actions, characterize them using a taxonomy of abstract code patterns, and assess the potential abusiveness of the patterns. The article then describes the design and implementation of a code pattern recognizer and uses the pattern recognizer to automatically analyze a dataset of 786 Python bot code repositories. The study does not only reveal the existence of 28 communication-specific code patterns - which could be used to assess the harmfulness of bot code - but also their consistent presence throughout all studied repositories.
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec:
Learning distributed representations of code. Proceedings of the ACM
on Programming Languages, 3(POPL):40, 2019.
DavidMBeskow and KathleenMCarley. Bot-hunter: A tiered approach
to detecting & characterizing automated activity on twitter. In SBPBRiMS
Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us
presidential election online discussion. First Monday, 21(11), 2016.
Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro.
Aiding the detection of fake accounts in large scale social online services.
In Proceedings of the 9th USENIX conference on Networked
Systems Design and Implementation, pages 15–15, 2012.
Zi Chu, Steven Gianvecchio, HainingWang, and Sushil Jajodia. Detecting
automation of twitter accounts: Are you a human, bot, or cyborg?
IEEE Transactions on Dependable and Secure Computing, 9(6):811–
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi,
and Maurizio Tesconi. Fame for sale: efficient detection of fake
twitter followers. Decision Support Systems, 80:56–71, 2015.
Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini,
and Filippo Menczer. Botornot: A system to evaluate social bots. In
WWW 2016, pages 273–274, 2016.
Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and
Alessandro Flammini. The rise of social bots. Communications of the
ACM, 59(7):96–104, 2016.
Tony Hey, Stewart Tansley, Kristin M Tolle, et al. The fourth paradigm:
data-intensive scientific discovery, volume 1. Microsoft research
Redmond, WA, 2009.
Barbara Kitchenham. Procedures for performing systematic reviews.
Keele, UK, Keele University, 33(2004):1–26, 2004.
Kyumin Lee, Brian David Eoff, and James Caverlee. Seven months with
the devils: A long-term study of content polluters on twitter. In ICWSM,
pages 185–192, 2011.
Santanu Paul and Atul Prakash. A framework for source code search
using program patterns. IEEE Transactions on Software Engineering,
Jacob Ratkiewicz, Michael Conover, Mark R Meiss, Bruno Gonçalves,
Alessandro Flammini, and Filippo Menczer. Detecting and tracking
political abuse in social media. In ICWSM, pages 297–304, 2011.
The Parliament of New Zealand. Harmful Digital Communications Act
Public Act 2015 No 63, 2015.
Grigorios Tsoumakas and Ioannis Vlahavas. Random k-labelsets: An
ensemble method for multilabel classification. In European conference
on machine learning, pages 406–417. Springer, 2007.
Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and
Alessandro Flammini. Online human-bot interactions: Detection, estimation,
and characterization. arXiv preprint arXiv:1703.03107, 2017.