An estimated two-thirds of tweeted links to popular websites are posted by automated accounts – not human beings
The role of so-called social media “bots” – automated accounts capable of posting content or interacting with other users with no direct human involvement – has been the subject of much scrutiny and attention in recent years. These accounts can play a valuable part in the social media ecosystem by answering questions about a variety of topics in real time or providing automated updates about news stories or events. At the same time, they can also be used to attempt to alter perceptions of political discourse on social media, spread misinformation, or manipulate online rating and review systems. As social media has attained an increasingly prominent position in the overall news and information environment, bots have been swept up in the broader debate over Americans’ changing news habits, the tenor of online discourse and the prevalence of “fake news” online.
In the context of these ongoing arguments over the role and nature of bots, Pew Research Center set out to better understand how many of the links being shared on Twitter – most of which refer to a site outside the platform itself – are being promoted by bots rather than humans. To do this, the Center used a list of 2,315 of the most popular websites1 and examined the roughly 1.2 million tweets (sent by English language users) that included links to those sites during a roughly six-week period in summer 2017. The results illustrate the pervasive role that automated accounts play in disseminating links to a wide range of prominent websites on Twitter.
How does this study define a Twitter bot?
Broadly speaking, Twitter bots are accounts that can post content or interact with other users in an automated way and without direct human input.
Bots are used for many purposes. This study focuses on a particular kind of bot behavior: bots that tweet or retweet links to content around the web. In other words, these are bots that post or promote specific websites or other online content.
Many bots do not identify themselves as bots, so this study uses a tool called Botometer to estimate the proportion of Twitter links to popular sites around the web that are posted by automated or partially automated accounts. One study suggests Botometer is about 86% accurate, and Pew Resesarch Center conducted its own independent validation tests of the Botometer system. To acknowledge the possibility of misclassification, we use the term “suspected bots” throughout this report. For details on how Botometer functions, see the methodology.
Among the key findings of this research:
- Of all tweeted links2 3 to popular websites, 66% are shared by accounts with characteristics common among automated “bots,” rather than human users.
- Among popular news and current event websites, 66% of tweeted links are made by suspected bots – identical to the overall average. The share of bot-created tweeted links is even higher among certain kinds of news sites. For example, an estimated 89% of tweeted links to popular aggregation sites that compile stories from around the web are posted by bots.
- A relatively small number of highly active bots are responsible for a significant share of links to prominent news and media sites. This analysis finds that the 500 most-active suspected bot accounts are responsible for 22% of the tweeted links to popular news and current events sites over the period in which this study was conducted. By comparison, the 500 most-active human users are responsible for a much smaller share (an estimated 6%) of tweeted links to these outlets.
- The study does not find evidence that automated accounts currently have a liberal or conservative “political bias” in their overall link-sharing behavior. This emerges from an analysis of the subset of news sites that contain politically oriented material. Suspected bots share roughly 41% of links to political sites shared primarily by liberals and 44% of links to political sites shared primarily by conservatives – a difference that is not statistically significant. By contrast, suspected bots share 57% to 66% of links from news and current events sites shared primarily by an ideologically mixed or centrist human audience.
Examples of Twitter bots in action
Bots can be used for a wide range of purposes. Here are some examples of bots that perform various tasks on Twitter:
- Netflix Bot (@netflix_bot) automatically tweets when new content has been added to the online streaming service.
- Grammar Police (@_grammar_) is a bot that identifies grammatically incorrect tweets and offers suggestions for correct usage
- Museum Bot (@museumbot) posts random images from the Metropolitan Museum of Art
- The CNN Breaking News Bot (@attention_cnn) is an unofficial account that sends an alert whenever CNN claims to have breaking news
- The New York Times 4th Down Bot (@NYT4thDownBot) is a bot that provides live NFL analysis.
- PowerPost by the Washington Post (@PowerPost) is a bot that provides news about decision-makers in Washington.
These findings are based on an analysis of a random sample of about 1.2 million tweets from English language users containing links to popular websites over the time period of July 27 to Sept. 11, 2017.4 To construct the list of popular sites used in this analysis, the Center identified nearly 3,000 of the most-shared websites during the first 18 days of the study period and coded them based on a variety of characteristics.5 After removing links that were dead, duplicated or directed to sites without sufficient information to classify their content, researchers arrived at a list of 2,315 websites.
First, these sites were categorized into six different topical groups based on their primary area of focus. The topical groupings included: adult content, sports, celebrity, commercial products or services, organizations or groups, and news and current events. For comparison with these primary categories, researchers put links that redirected to content within Twitter itself into a separate category.
Second, sites categorized as having a broad focus on news and current events (in total, 925 sites met this criteria) were subsequently coded based on three additional criteria:
- Whether a majority of the site’s content consisted of aggregated or republished material produced by other sites or publications;
- Whether the site included a politics section, and/or prominently featured political stories in its top headlines; and
- Whether the site had a contact page (a trait that can serve as a proxy for whether a site offers readers the ability to submit comments and feedback).
Third, the Center identified an additional subset of news and current events sites that featured political stories or a politics section and that primarily serve a U.S. audience. Each of these politically oriented news and current events sites was then categorized as having primarily a liberal audience, a conservative audience or a mixed readership.6
The next step was to examine each tweeted link to those sites and attempt to determine if the link was posted from an automated account. To identify bots, the Center used a tool known as “Botometer,” developed by researchers at the University of Southern California and Indiana University. Now in its second incarnation, Botometer estimates the likelihood that any given account is automated or not based on a number of criteria, including the age of the account, how frequently it posts, and the characteristics of its follower network, among other factors. Accounts estimated as having a relatively high likelihood of being automated based on Pew Research Center’s tests of the Botometer system were classified as bots for the purposes of this analysis.7
Collectively, the data gathering, site coding and bot detection analysis described above provide an answer to the following key research question: What proportion of tweeted links to popular websites are posted by automated accounts, rather than by human users?
This research is part of a series of Pew Research Center reports examining the information environment on social media and the ways that users engage in these digital spaces. Previous studies have documented the nature and sources of tweets regarding immigration news, the ways in which news is shared via social media in a polarized Congress, the degree to which science information on social media is shared and trusted, the role of social media in the broader context of online harassment, how key social issues like race relations play out on these platforms, and the patterns of how different groups arrange themselves on Twitter.
It is important to note that bot accounts do not always clearly identify themselves as such in their profiles, and any bot classification system inevitably carries some risk of error. The Botometer system has been documented and validated in an array of academic publications, and researchers from the Center conducted a number of independent validation measures of its results.8 However, some human accounts may be misclassified as automated, while some automated accounts may be misclassified as genuine. There is therefore a degree of uncertainty in these estimates of the share of traffic by suspected bot accounts.
In addition, the analysis described in this report is based on a subset of tweets collected over a specific period of time. It is not an analysis of all websites or of all media properties, but rather an analysis of popular websites and media outlets as measured by the number of links posted on Twitter to their content. This analysis does not seek to evaluate whether these links were being shared by “good” or “bad” bots, or whether those bots are controlled from inside or outside the U.S. It also did not seek to assess the reach of the tweets in question or to determine how many human users saw, clicked through or otherwise engaged with bot-generated content.
Further details on our bot-classification effort can be found in the methodology of this report.
To continue reading, see: