Semantics-enabled Transfer Learning for Mobility Analytics: a SoBigData TNA experience
The research group in Pisa
- City feature collection: Collect potential city characteristics/features from linked open data sources and also considering recommendations from the literature on those features that most likely have an influence on individual mobility. City features were collected from the Linked Open Data site of the Italian National Statistics Institute (ISTAT) and included: geospatial features (city surface, min and max altitude) and socio-demographic features related to the population of the cities. In particular, this dataset provides census data from 2011 for a variety of statistical indicators such as those related to their population (in terms of number, gender, age, occupation, education, number of foreign citizens).
- City features preparation and selection: The features extracted are normalized and a correlation analysis is performed in order to select the maximum set of them which are not inter-correlated.
- Clustering of cities: Cluster cities based on these characteristics. Particularly interesting is the cluster of cities similar to Pisa. Our hypothesis is that the adaptation of the ABC Classifier to these similar cities will be more successful.
- City features versus mobility statistics: An analysis to detect how the mobility statistics extracted from GPS traces vary across the various clusters. From the original mobility dataset, for each city, we extracted statistical information (trip length, duration, speed) about the incoming, outgoing journeys as well as journeys inside the city. The goal was to analyze the differences between the distribution of values for all cities along these dimensions and that of certain clusters. This analysis step is currently ongoing.
- Running and evaluating the ABC Classifier: Run the classifier on cities from the cluster of similar cities to Pisa as well as different cities from Pisa, and compare performance. The assumption is that performance will depend on how similar the cities are. Performance can in a first instance be estimated if the distribution of activity types is similar to that expected from the experiments on data from Pisa. Any large variations indicate a low performance.
The main output was the creation of a dataset of 522 Italian cities and their features extracted from LOD cloud (this dataset is made available in the SoBigData catalogue). This dataset was also used to perform an initial clustering of cities, as per step 3 above. We relied on the support of the KNIME tool (see clustering workflow attached). We used k-Means Clustering (number of clusters 10, then 20) and manually inspected the results. We found that population size is one of the key differentiating features leading to clusters with very small (904) or large (271,767) populations. Within cities with a similar population, the clustering often distinguishes between those with big/small surface. In the various clustering experiments, Pisa and Florence are always in different clusters.
Results and Conclusions:
The major result of this visit consists in an approach to semantics-supported transfer learning for mobility analytics algorithms. We investigated this concept in a concrete case (that of the ABC classifier and with data about city features) leading to the following concrete results:
- A dataset of city characteristics extracted from linked open data;
- An Experiment for city-clustering.
The overall conclusion is that semantic data (such as linked open data) is promising to support transfer learning in general, although this hypothesis still needs to be proved in future work which will be performed beyond the visit to investigate the performance of the activity recognition in cities classified as similar to Pisa. More broadly, this approach should be further investigated for other mobility algorithms as well as in other domains.
[1] S. Rinzivillo, L. Gabrielli, M. Nanni, L. Pappalardo, D. Pedreschi and F. Giannotti, "The purpose of motion: Learning activities from Individual Mobility Networks," 2014 International Conference on Data Science and Advanced Analytics (DSAA), Shanghai, 2014, pp. 312-318.