A TNA experience on ordinal quantification

Trans-national access (TNA) @ ISTI-CNR, Pisa, Italy

While a typical objective in supervised machine learning is to predict a label for each individual instance, quantification aims instead at predicting the label distribution within a set of instances. It has been widely acknowledged that this problem is not solved by simply classifying and counting the individual instances in the set; quite to the contrary, quantification deserves consideration as a learning task of it’s own right. Applications of quantification range from the social sciences over technical support to astro-particle physics. Just recently, the topic gained momentum from the first international workshop on quantification, which was co-organized by Alejandro Moreo and Fabrizio Sebastiani from ISTI-CNR in Pisa, Italy.

Me on a day trip to Firenze, a city close to Pisa

I met Alejandro and Fabrizio on this workshop for the first time. And we realized that the three of us had been working on ordinal quantification (OQ), a quantification task where the classes are totally ordered and misclassifications are weighted by the distance between classes. For instance, if we need to predict the label distribution of sentiment classes {negative, neutral, positive}, we are facing an OQ problem: a misprediction between the class prevalence of negative and positive induces a larger error than a misprediction between negative and neutral or between neutral and positive.

Meeting other researchers who work on OQ is not very common. In fact, the largest part of the existing quantification literature deals with non-ordinal settings, despite the importance of ordinality in many applications. For several years, I myself was not even aware of OQ as a task of its own right. Before I came across quantification literature, I was eagerly working on OQ methods that have been proposed within physics research, under the name “unfolding”. Indeed, physicists have devised a large collection of OQ methods, which, however, kept disconnected from quantification literature due to the interdisciplinary gap between computer science and physics. This gap manifests not only in different names for the OQ problem, but also in different notational conventions and different focuses.

Alejandro, Fabrizio, and I wanted to bridge this gap. We wanted to find the commonalities and differences between OQ methods from quantification literature and physics literature. We wanted to find their strengths and weaknesses. So I applied for a TNA visit at ISTI-CNR. During my stay, we all worked intensively on the topic. After one month, our manuscript was ready for submission. And we are quite happy with the results.

First, we have created two datasets for OQ research that overcome the inadequacies of the previously available ones. Second, we have experimentally compared the most important OQ algorithms proposed in quantification literature and in physics literature. Third, we have proposed three novel OQ algorithms, which are based on the idea of preventing ordinally implausible estimates through regularization, an idea that originates within physics research. In a nutshell, we have indeed bridged the gap between quantification literature and physics literature.

My most vivid memories, however, are all the activities that we undertook in the after-work hours. Meeting the welcoming people who work at ISTI-CNR was an incredible pleasure. The city of Pisa turned out to be not only a beautiful city with lots of cafés, restaurants and pubs (I am a huge fan of the Italian cuisine), but also a beautiful place for outdoor activities (I am a huge fan of outdoor activities). The Monte Pisano, a small mountain chain next to Pisa, is easily reachable by bike or even by foot. These mountains offer several possibilities for hiking, rock climbing, and cycling.

For these memories, and for the honor to work with some of the most renowned quantification researchers, I am eternally grateful to SoBigData++. We are planning to continue our collaboration on quantification in the near future, to tackle several exciting issues that we have identified.

Author: Mirko Bunse, Ph.D. student and research associate at TU Dortmund University, Germany

Contact: mirko.bunse@cs.tu-dortmund.de
Twitter: @thestormisyou

Posted by beatrice.rapisa...