A Social Scientist's Point of View in Navigating Computational Methods Against Disinformation
As a social scientist studying the intersection of disinformation and social media, I have participated in the SoBigData++ TNA program at the Department of Computer Science (University of Sheffield).
During my visit, I aimed to study the outcomes of a novel alert system developed for the project vera.ai that discovers the most shared coordinated links on Facebook by monitoring a list of actors known for having previously shared problematic content. This exploration requires stepping beyond the confines of a single discipline, requiring interdisciplinary skills. My time at So Big Data ++ proved to be very beneficial in this regard.
Upon my arrival, my host and their team's welcoming atmosphere facilitated a rich learning environment. I was involved in many Department research activities, from EU project meetings to theoretical aspects briefing with Department colleagues. In the frame of my research project and those carried on by the hosting research team, we explored, for example, how we can define a “narrative.”
During my visit, I faced the problem of working with a dataset composed of many Facebook posts containing a brief text and one or more images. This made it more challenging to run a topic modeling. One of the purposes of this study is to surface narrative frames from a sample of the VERA.AI-alert. With this aim, we explored different tools and methodologies for running multi-modal topic modeling. After discussing this with the department team, we agreed to try to use large language models to describe the content of the image and, just subsequently, to run the topic modeling on the texts of the dataset. Experimenting with different generative AI models, both open source and not, we found some limitations in reaching the goal of accurately describing the content of images. For example, they have limitations in recognizing politicians' faces and controversial political symbols.
In the meantime, we proceeded with the detection of narrative frames on other links. We detected the language of these links and translated all the corpus into English using GPT-4. Then, we run a topic modeling on the translated text using Latent Scope, a tool to embed, project, cluster, and explore a corpus of text.
The time I spent in the USFD Department of Computer Science was beneficial to me to improve my knowledge of Natural Language Processing through computational methods. In particular, these insights will be helpful for my future research to develop a comprehensive approach to understanding and fighting disinformation. The collaborative atmosphere and diverse expertise at SoBigData ++ were vital in enhancing our efforts to protect information integrity on social media platforms.