By: Sashrika Pandey and Smitha Milli
Social media ranking algorithms personalize the content you see to keep you engaged. However, many worry that, by optimizing for engagement, these algorithms may inadvertently amplify hostile and polarizing discourse. To investigate this, we, along with Micah Carroll, Yike Wang, Sebastian Zhao, and Anca Dragan, conducted a study to identify whether Twitter’s ranking algorithm does indeed amplify divisive discourse, and if so, whether this is because of its focus on optimizing for user engagement.
We conducted a randomized controlled experiment to measure the impact of Twitter’s1 engagement-based ranking algorithm. Our work was pre-registered, required no collaboration with the platform, and our data and code are available publicly to the research community.
Over two weeks in February 2023, we recruited 806 unique Twitter users using CloudResearch Connect who participated in our study a total of 1,730 times over multiple waves.
Using Connect allowed us to easily interact with our participants across multiple waves of our study. Furthermore, Connect’s capability to have tasks that involve external software was necessary for our experiments, which relied on a Chrome extension for data collection.
Each time a user participated in our study, we collected the first ten tweets that they would have been shown by Twitter’s personalized ranking algorithm, and also the ten most recent tweets from people they follow (i.e., the tweets they would see under a reverse-chronological baseline). We then surveyed users about both sets of tweets in a random order, allowing us to causally estimate the impact of the engagement-based ranking algorithm on users’ survey outcomes, relative to the reverse-chronological baseline.
We found that the algorithm significantly amplifies tweets expressing anger and other negative emotions (sadness and anxiety). Moreover, when readers read tweets picked by the engagement-based algorithm, they feel more of all four emotions we asked about (anger, sadness, anxiety, and happiness).
The tweets picked by the engagement-based algorithm were more partisan and more likely to express out-group animosity, that is, animosity from one partisan side to the other side. Moreover, after readers read tweets selected by the engagement-based algorithm (compared to tweets from the reverse-chronological baseline), they had a better perception of their political in-group and a worse perception of their political out-group.
Overall, users slightly preferred tweets selected by the algorithm than those in the chronological baseline. However, users were less likely to prefer the political tweets selected by the algorithm; this suggests that the algorithm falls short of satisfying users’ stated preferences, especially for political content.
Given that the engagement-based algorithm amplifies divisive content and fails to meet users’ stated preferences, what would happen if we ranked content by what users say they would like to see instead of by engagement? Based on the twenty tweets that we had collected for each user, we experimented with an alternative ranking of those tweets. In particular, our “Stated Preference” timeline for each user consisted of the ten tweets that scored highest according to the user’s survey response of whether they would want to see that tweet while using Twitter.
Relative to Twitter’s engagement-based algorithm, ranking by stated preferences led to less angry, partisan, and out-group hostile content. Upon further investigation, we discovered that this was largely because ranking by user’s stated preferences led to fewer tweets from the reader’s out-group. This suggests that a timeline ordered by user’s explicitly stated preferences could potentially reinforce “echo chambers”. However, our findings come with an important caveat: users may be open to seeing respectful, civil tweets from the opposing side, but those simply may not have appeared in the limited pool of twenty tweets that we had for each user (especially since ten of those tweets were picked by the engagement-based algorithm).
Our study demonstrates that Twitter’s engagement-based algorithm substantially amplifies angry, partisan out-group hostile content, and that ranking by stated preferences, rather than engagement, reduces the amplification of such content. Thus, optimizing for a more reflective notion of user preference like users’ stated preferences may lead to less divisive conversations. At the same time, we also found that ranking by stated preferences may reinforce echo chambers. Overall, our research underscores the need for a more nuanced approach to content ranking that balances engagement, users’ stated preferences, and downstream sociopolitical outcomes.