Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. The Impact of Data Persistence Bias on Social Media Studies
 
conference paper

The Impact of Data Persistence Bias on Social Media Studies

Elmas, Tugrulcan  
January 1, 2023
Proceedings Of The 15Th Acm Web Science Conference, Websci 2023
15th ACM Web Science Conference (WebSci)

Social media studies often collect data retrospectively to analyze public opinion. Social media data may decay over time and such decay may prevent the collection of the complete dataset. As a result, the collected dataset may differ from the complete dataset and the study may suffer from data persistence bias. Past research suggests that the datasets collected retrospectively are largely representative of the original dataset in terms of textual content. However, no study analyzed the impact of data persistence bias on social media studies such as those focusing on controversial topics. In this study, we analyze the data persistence and the bias it introduces on the datasets of three types: controversial topics, trending topics, and framing of issues. We report which topics are more likely to suffer from data persistence among these datasets. We quantify the data persistence bias using the change in political orientation, the presence of potentially harmful content and topics as measures. We found that controversial datasets are more likely to suffer from data persistence and they lean towards the political left upon recollection. The turnout of the data that contain potentially harmful content is significantly lower on non-controversial datasets. Overall, we found that the topics promoted by right-aligned users are more likely to suffer from data persistence. Account suspensions are the primary factor contributing to data removals, if not the only one. Our results emphasize the importance of accounting for the data persistence bias by collecting the data in real time when the dataset employed is vulnerable to data persistence bias.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3578503.3583630
Web of Science ID

WOS:001118948600020

Author(s)
Elmas, Tugrulcan  
Corporate authors
ACM
Date Issued

2023-01-01

Publisher

Assoc Computing Machinery

Publisher place

New York

Published in
Proceedings Of The 15Th Acm Web Science Conference, Websci 2023
ISBN of the book

979-8-4007-0089-7

Start page

196

End page

207

Subjects

Technology

•

Data Persistence

•

Bias

•

Reproducibility

•

Social Media

•

Twitter

•

Deletions

•

Datasets

•

Political Orientation

•

Sampling

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSIR  
Event nameEvent placeEvent date
15th ACM Web Science Conference (WebSci)

Austin, TX

APR 30-MAY 01, 2023

Available on Infoscience
February 20, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/204728
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés