Karma Chameleons: Data Collection Techniques and Account Characterization for Bot Detection on Reddit
Loading...
Authors
Floam, Marissa
Issue Date
2025
Type
Thesis
Language
en_US
Keywords
bot detection , reddit , social bots , social media
Alternative Title
Abstract
Malicious bots on social media platforms present an ever-evolving threat to the integrity of online communication. As platforms like Twitter/X, Meta, and Reddit continue to grow in popularity, so do opportunities for malicious actors to create bot accounts. These bots, which are often designed to spread deceptive material, spam, or unoriginal content, can significantly influence public opinion and suppress creativity. As a result, it is crucial that general users are able to easily identify these bots so they can be removed, protecting the authenticity of these online spaces. This thesis addresses one of the main challenges in social media bot detection: the collection of bot or human account datasets with a clearly established ground truth that can be used to train machine learning models. Without these datasets, developing accurate models to distinguish between bot and human accounts becomes difficult. In contrast to Twitter/X, bot detection research targeting Reddit is scarce, even though it is an increasingly popular platform. This thesis outlines characteristics that differentiate bot accounts from human accounts on Reddit and proposes methods for creating reliable datasets for bot detection. Human and bot datasets are combined and tested in six decision tree models to determine the accuracy of each data collection method for use in bot detection. Accurately distinguishing between human and bot accounts is essential for dataset creation, and focusing on specific characteristics allows for clearer determinations between bot and human accounts. Ultimately, in order to maintain the integrity and trustworthiness of one of the most popular social media platforms, Reddit, this thesis addresses a critical gap in bot detection research by providing a reliable framework for data collection.