Skip to main content

SoBigData Event A Research Infrastructure to Empower Data Science Analysis A Research Infrastructure to Empower Data Science Analysis
A Hands-on Tutorial showing the services provided by SoBigData Research Infrastructure or the new generation of Responsible data science.
Dedicated VRE for the event: subscribe here!

09 October 2021, 14:00 - 18:00 WEST (GMT +1)

Abstract: the SoBigdata RI’s service platform empowers researchers for the design and execution of large-scale social mining experiments. Pushing the FAIR (findable, accessible, Interoperable, responsible) and FACT (Fair, Accountable, Confidential and Transparent) principles, the RI renders social mining experiments more efficiently designed, and repeatable by leveraging concrete tools that operationalize ethics, incorporating values and norms for privacy, fairness, transparency and pluralism; also touching upon how data science helps us to make more informed choices, underlining the need to achieve collective intelligence without compromising the rights of individuals.

The tutorial will show the services made available by the RI and will focus on the computational resources provided to the users in the SoBigData virtual laboratory. Examples of usage of the SoBigData libraries and its method engine will be presented and the users will be able to follow and repeat the experience on a dedicated Virtual Research Environment built for DSAA 2021.

1. Motivation

Data Science is rapidly changing the way we do business, socialize, and govern society. A new paradigm is emerging, where theories and models and the bottom-up discovery of knowledge from data mutually support each other. Experiments and analyses over massive datasets are becoming functional not only to the validation of existing theories and models but also to the data-driven discovery of patterns emerging from data, which can help scientists design better theories and models, yielding a deeper understanding of the complexity of social, economic, biological, technological, cultural and natural phenomena. Data science is changing the way scientific research is performed. 

Research infrastructures (RIs) play a crucial role in the advent and development of data science. Resources such as data and methods help domain and data scientists to transform research or an innovation question into a responsible data-driven analytical process. This process is executed onto the platform, supporting experiments that yield scientific output, policy recommendations, or innovative proofs-of-concept. In this context, SoBigData RI is designed to enable multidisciplinary scientists and innovators to realize social mining experiments and to make them reusable by the scientific communities. All the components have been introduced for implementing data science from raw data management to knowledge extraction, with particular attention to legal and ethical aspects.

The objectives of the tutorial are to show how SoBigData RI can support data scientists in doing cutting edge science and experiments. In this perspective, our target audience also includes people interested in big data analytics, computational social science, digital humanities, city planners, wellbeing, migration, sport, health within the legal/ethical framework for responsible data science and artificial intelligence applications.  With its tools and services, SoBigData RI promotes the possibilities that new generations of researchers have for executing large scale experiments on the cloud making them accessible and transparent to a community. Moreover, specialized libraries developed in SoBigData++ project will be freely accessible in order to make cutting edge science in a cross-field environment. 

This tutorial does not require any specific background since the services (and the use of them) are conceived to be used by a wide range of analyses not strictly related to computer science. In any case, a basic notion on data analysis and data mining can help the attendee to understand some advanced concepts related to methods that will be employed in the tutorial. 

2. Table of contents

The tutorial will be 3 hours containing:

  • 1 hour of presentations describing the European project SoBigData++, the RI Services, and the Responsible Data Science principles and tools;
  • 1 hour and half of practical use of the RI with real examples of analysis in a dedicated Virtual research environment created for DSAA 2021;
  • 30 minutes for an open discussion with the attendees on the various aspects presented.

In details, the preliminary schedule is the following:

SoBigData++ project: an ecosystem for Ethical Social Mining - Roberto Trasarti - CNR (10 min.).
This talk introduces SoBigData++  project with the aim of putting in context the participants presenting the main objectives of the project and the consortium of experts involved working on the vertical contextes:  Societal Debates and Online Misinformation, Sustainable Cities for Citizens, Demography, Economics & Finance 2.0, Migration Studies, Sports Data Science, Social Impact of Artificial Intelligence and Explainable Machine Learning. Part of this presentation will be the description of an ethical approach to data science which is a pillar of the SoBigData++ project.


SoBigData RI Services - Valerio Grossi - CNR (20 min).
An overview of the SoBigData RI services will be shown including the Exploratories (Vertical research contexts), the resource catalogue, the training area and SoBigData Lab.


Hands-on JupyterHub service and SoBigData Libraries - Giulio Rossetti - CNR (45 min).
This first hands-on session focuses on the libraries and methods developed within the SoBigData consortium. Code examples and case studies will be introduced by leveraging a customized JupyterHub notebook service hosted by SoBigData. Using such a freely accessible coding environment, we will discuss a subset of the functionalities available to SoBigData users to design and run their experiments.
Hands-on computational engine & technologies - Massimiliano Assante - CNR (45 minutes).
In this second hands-on session, the tutorial will focus on the computational engine provided by SoBigData. Real examples will be presented in order to highlight the functionalities to deploy an algorithm and run it on the cloud.
Responsible Data Science: 
  • Legality Attentive data Science: it is needed and it is possible! - Giovanni  Comandé - SSSA (20 minutes)
  • FAIR: an E-learning module for GDPR compliance and ethical aspects - Francesca Pratesi - UNIPI (10 minutes) 
Open discussion with participants - Moderator: Beatrice Rapisarda - CNR (30 minutes)
An open discussion to give more details on specific aspects according to the requests of the audience (not already addressed during the tutorial or presentations).

Speakers & Short Bio: 

  • Roberto Trasarti: He is a member of ISTI-CNR, and also a member of Knowledge Discovery and Delivery Laboratory. Currently the coordinator of SoBigData++ project ( His interests regard Data mining, Spatio-Temporal data analysis, Artificial intelligence, Automatic Reasoning.
  • Valerio Grossi: He holds a Ph.D. in Computer Science from the University of Pisa and is part of the Knowledge Discovery and Data Mining Laboratory. He is the project manager of the SoBigData++ project, and his research interests focus on the analysis of massive and complex data, including mining data streams, ontology-driven mining, business intelligence, and knowledge discovery systems.
  • Giulio Rossetti: He is a member of the Knowledge Discovery and Data Mining Laboratory a joint research team that connects the Computer Science Dept. of the University of Pisa and the ISTI-CNR. His research activity centers on the definition of algorithms for complex network analysis and data science.
  • Massimiliano Assante: He is a Research Technologist of the "Istituto di Scienza e Tecnologie della Informazione A. Faedo" (ISTI), an institute of the Italian National Research Council (CNR). He holds a Ph.D. In Information Engineering received from the University of Pisa. His research interests include e-infrastructures, Virtual Research Environments and Scientific Repositories. He is responsible for the IT Operations of the SoBigData e-infrastructure.
  • Francesca Pratesi: She is a member of the Knowledge Discovery and Data Mining Lab. Her research interests include data mining, data privacy and privacy risk assessment, mainly in spatio-temporal data. Recently, she broadened her interest, moving towards the Ethics-by-Design paradigm and Trustworthy AI.
  • Giovanni  Comandé: Full Professor of Private Comparative Law at Scuola Superiore S. Anna Pisa, Italy. PhD. SSSA, LLM Harvard Law School, Founder and Director of the LIDER-LAB ( Attorney at law (Pisa since 1995); (New York Bar since 1997). He is external scientific and ethical expert evaluator for the EU Commission, National research Foundation – Research and Innovation Support and Advancement of South Africa, the University of Haifa (Israel); the Italian Agenzia Nazionale di Valutazione del Sistema Universitario e della Ricerca; The Hebrew University Jerusalem; the Fonds québécois de la recherche sur la société et la culture.
  • Beatrice rapisarda: Head of Internal and External Communication for SoBigData++ project. She is responsible for the design and production of material to support communication and promotion in the scientific / technological field (websites, brochures, presentations, posters, videos, newsletters, etc.). She is also part of the organization for events relating to research activities and scientific dissemination for SoBigData++.


Organizers/Contact Person: 

Roberto Trasarti -
Valerio Grossi -