Skip to main content

SoBigData Articles

APIcalypse Now: Redefining data access regimes in the face of the Digital Services Act

The rise of restrictive data access policies, commonly referred to as the APIcalypse, has significantly impacted researchers' ability to collect and analyse data from social media platforms. The Digital Methods Initiative Winter School 2024 project, 'APIcalpyse NOW' centred these challenges, examining the evolving landscape of data access in the context of the forthcoming Digital Services Act (DSA), in particularly Article 40 which outlines the procedures and restrictions for vetted researchers to access platform data for research purposes, ensuring compliance with legal and ethical standards. This post highlights the main takeaways from the research and provides insights for academics across various fields. At the time of writing, Article 40 was not yet enforced, thus the object of study was the access opened up voluntarily by the platforms themselves to researchers, in anticipation of the DSA.

 

 

Main Takeaways

 

Redefining Data Access Regimes

The APIcalypse, as originally defined by Axel Bruns in 2019, marked a shift in how researchers access data, moving from relatively open APIs to more restrictive environments. The DSA introduces additional layers of compliance and regulations, intending to protect user privacy and ensure data security whilst allowing researchers access to data otherwise unobtainable without breaking platforms' Terms of Services. However, these measures may result in reduced data availability and increased difficulty in obtaining necessary approvals, despite mandating some level of access.

Researcher Application and Eligibility

Social media platforms like X (formerly known as  Twitter), Facebook, Instagram, and TikTok have stringent requirements for granting data access to researchers. These differ between individual platforms, but include:

- Institutional affiliation and approval from an Institutional / Ethical Review Board (IRB/ERB)

- Detailed application forms outlining the research purpose, data needed, and evidence of non-commercial use

- Various degrees of proof of compliance with data handling and user consent protocols

These rigorous requirements create barriers, especially for independent researchers and those from smaller institutions. They also hold the potential to severely limit the types of studies that can feasibly be conducted. 

Data Handling and API Usage

The platforms enforce strict rules on how data can be collected, stored, and used with rules including that in some cases:

- Data must be refreshed every 15 days, with any unavailable data deleted

- Researchers must share their findings with the platform before publication

- Explicit user consent is mandatory for data usage

These restrictions hinder longitudinal studies and introduce potential biases, as data may change or become unavailable over time. 

Platform-Specific Challenges

Each platform presents unique challenges:

- X (formerly known as Twitter): Researchers must comply with X/Twitter's ToS and non-commercial use policies, avoid sharing data with government entities, and follow subscription-based API access levels. Specific DSA-related policies restrict disclosure and distribution of X content to outside entities, except under vetted researcher status or legal requirements, as outlined in the Developer Agreement and the EU Digital Services Act.

- Facebook and Instagram: Researchers must comply with Meta's restrictive Terms of Service (ToS), community guidelines, and policies to avoid account disabling, potentially altering their methods and strategies, especially in quantitative research due to restrictions on data scraping and access. Compliance involves heightened sensitivity to the ephemerality of violating posts, with qualitative research being relatively easier to align with these terms.

- TikTok: Researchers must use only the TikTok Research API for data retrieval, avoid unauthorized third-party access, false identities, excessive requests, and must refresh data every 15 days. Distribution, modification, selling of data, and scraping are prohibited. Research outputs must be reviewed by TikTok 7 days before publication, and TikTok has free use of these outputs.

 

 

Implications for Researchers

The findings of the Winter School 2024 project underscored the implications of the anticipation of DSA Article 40 enforcement for social media research.

Despite promises of improved transparency, the actual data access provisions by platforms like Meta, TikTok, and X (formerly Twitter) fall somewhat shorter. Gaining access to Meta's Content Library is difficult and requires high institutional credentials. Despite being marketed as 'new', it offers less data than pre-existing tools, such as Crowdtangle, and at the time prohibited local data downloads. This restrictive process raises doubts about Meta's commitment to transparency, with privacy cited as a reason for limiting data access.

Researchers using TikTok's API must submit detailed preparation documents and regularly refresh data every 15 days, complicating long-term research. TikTok also has the right to review publications before release, potentially interfering with independent research and conflicting with institutional data retention policies. Documentation for X's (formerly known as Twitter) research API is unclear, with many broken links and no dedicated research API available at the time of writing. This lack of transparency leads to uncertainty about the actual terms of conducting DSA-governed research, with no public records of approved applications.

Overall, these platforms can in certain circumstances be seen as engaging in 'performative compliance' with the DSA, offering limited data access and imposing restrictive terms. This trend hinders timely research and suggests a deliberate strategy to maintain control over data use and interpretation, raising concerns about the autonomy and integrity of digital research. Compliance with platform-specific terms of service and non-disclosure agreements influences research methodologies, often necessitating alternative data collection strategies such as scraping tools. Finally, platforms' increasing control over research outputs raises concerns about academic freedom and the potential for censorship, though this is not necessarily the case across the board.

 

For more detailed insights, you can access the full research report: https://shorturl.at/G63pF

 

This project was carried out as part of, and presented at, the Digital Methods Initiative Winter School 2024 data sprint by Martin Trans (facilitator, UvA), Davide Beraldo (facilitator, UvA), Luca Draisci (information designer, DensityDesign), Leila Afsahi, Madeline Brennan, Vanessa Goldschmidt, Tereza Grendarova, Hannah Hamilton, Melis Keskin, Markella Papasokratous, Giovanni Rossetti, Madelyn Webb, Jade Williams, and Honglan Xu.