SoBigData.eu: A Research Infrastructure to Empower Data Science Analysis
SoBigData.eu: A Research Infrastructure to Empower Data Science Analysis
A Hands-on Tutorial showing the services provided by SoBigData Research Infrastructure or the new generation of Responsible data science.
Dedicated VRE for the event: subscribe here!
09 October 2021, 14:00 - 18:00 WEST (GMT +1)
Abstract: the SoBigdata RI’s service platform empowers researchers for the design and execution of large-scale social mining experiments. Pushing the FAIR (findable, accessible, Interoperable, responsible) and FACT (Fair, Accountable, Confidential and Transparent) principles, the RI renders social mining experiments more efficiently designed, and repeatable by leveraging concrete tools that operationalize ethics, incorporating values and norms for privacy, fairness, transparency and pluralism; also touching upon how data science helps us to make more informed choices, underlining the need to achieve collective intelligence without compromising the rights of individuals.
The tutorial will show the services made available by the RI and will focus on the computational resources provided to the users in the SoBigData virtual laboratory. Examples of usage of the SoBigData libraries and its method engine will be presented and the users will be able to follow and repeat the experience on a dedicated Virtual Research Environment built for DSAA 2021.
1. Motivation
Data Science is rapidly changing the way we do business, socialize, and govern society. A new paradigm is emerging, where theories and models and the bottom-up discovery of knowledge from data mutually support each other. Experiments and analyses over massive datasets are becoming functional not only to the validation of existing theories and models but also to the data-driven discovery of patterns emerging from data, which can help scientists design better theories and models, yielding a deeper understanding of the complexity of social, economic, biological, technological, cultural and natural phenomena. Data science is changing the way scientific research is performed.
Research infrastructures (RIs) play a crucial role in the advent and development of data science. Resources such as data and methods help domain and data scientists to transform research or an innovation question into a responsible data-driven analytical process. This process is executed onto the platform, supporting experiments that yield scientific output, policy recommendations, or innovative proofs-of-concept. In this context, SoBigData RI is designed to enable multidisciplinary scientists and innovators to realize social mining experiments and to make them reusable by the scientific communities. All the components have been introduced for implementing data science from raw data management to knowledge extraction, with particular attention to legal and ethical aspects.
The objectives of the tutorial are to show how SoBigData RI can support data scientists in doing cutting edge science and experiments. In this perspective, our target audience also includes people interested in big data analytics, computational social science, digital humanities, city planners, wellbeing, migration, sport, health within the legal/ethical framework for responsible data science and artificial intelligence applications. With its tools and services, SoBigData RI promotes the possibilities that new generations of researchers have for executing large scale experiments on the cloud making them accessible and transparent to a community. Moreover, specialized libraries developed in SoBigData++ project will be freely accessible in order to make cutting edge science in a cross-field environment.
This tutorial does not require any specific background since the services (and the use of them) are conceived to be used by a wide range of analyses not strictly related to computer science. In any case, a basic notion on data analysis and data mining can help the attendee to understand some advanced concepts related to methods that will be employed in the tutorial.
2. Table of contents
The tutorial will be 3 hours containing:
- 1 hour of presentations describing the European project SoBigData++, the RI Services, and the Responsible Data Science principles and tools;
- 1 hour and half of practical use of the RI with real examples of analysis in a dedicated Virtual research environment created for DSAA 2021;
- 30 minutes for an open discussion with the attendees on the various aspects presented.
In details, the preliminary schedule is the following:
- Legality Attentive data Science: it is needed and it is possible! - Giovanni Comandé - SSSA (20 minutes)
- FAIR: an E-learning module for GDPR compliance and ethical aspects - Francesca Pratesi - UNIPI (10 minutes)
Speakers & Short Bio:
- Roberto Trasarti: He is a member of ISTI-CNR, and also a member of Knowledge Discovery and Delivery Laboratory. Currently the coordinator of SoBigData++ project (https://plusplus.sobigdata.eu/). His interests regard Data mining, Spatio-Temporal data analysis, Artificial intelligence, Automatic Reasoning.
- Valerio Grossi: He holds a Ph.D. in Computer Science from the University of Pisa and is part of the Knowledge Discovery and Data Mining Laboratory. He is the project manager of the SoBigData++ project, and his research interests focus on the analysis of massive and complex data, including mining data streams, ontology-driven mining, business intelligence, and knowledge discovery systems.
- Giulio Rossetti: He is a member of the Knowledge Discovery and Data Mining Laboratory a joint research team that connects the Computer Science Dept. of the University of Pisa and the ISTI-CNR. His research activity centers on the definition of algorithms for complex network analysis and data science.
- Massimiliano Assante: He is a Research Technologist of the "Istituto di Scienza e Tecnologie della Informazione A. Faedo" (ISTI), an institute of the Italian National Research Council (CNR). He holds a Ph.D. In Information Engineering received from the University of Pisa. His research interests include e-infrastructures, Virtual Research Environments and Scientific Repositories. He is responsible for the IT Operations of the SoBigData e-infrastructure.
- Francesca Pratesi: She is a member of the Knowledge Discovery and Data Mining Lab. Her research interests include data mining, data privacy and privacy risk assessment, mainly in spatio-temporal data. Recently, she broadened her interest, moving towards the Ethics-by-Design paradigm and Trustworthy AI.
- Giovanni Comandé: Full Professor of Private Comparative Law at Scuola Superiore S. Anna Pisa, Italy. PhD. SSSA, LLM Harvard Law School, Founder and Director of the LIDER-LAB (www.lider-lab.it). Attorney at law (Pisa since 1995); (New York Bar since 1997). He is external scientific and ethical expert evaluator for the EU Commission, National research Foundation – Research and Innovation Support and Advancement of South Africa, the University of Haifa (Israel); the Italian Agenzia Nazionale di Valutazione del Sistema Universitario e della Ricerca; The Hebrew University Jerusalem; the Fonds québécois de la recherche sur la société et la culture.
- Beatrice rapisarda: Head of Internal and External Communication for SoBigData++ project. She is responsible for the design and production of material to support communication and promotion in the scientific / technological field (websites, brochures, presentations, posters, videos, newsletters, etc.). She is also part of the organization for events relating to research activities and scientific dissemination for SoBigData++.
Organizers/Contact Person:
Roberto Trasarti - roberto.trasarti@isti.cnr.it
Valerio Grossi - valerio.grossi@isti.cnr.it