Skip to main content

SoBigData Articles

Research on learned data structures and algorithms: a SoBigData.it TNA visit

Data structures are part of the backbone of all data systems, data analytics and machine learning pipelines. These crucial components carefully organise information to make it efficiently accessible and usable by algorithms and applications, thus accelerating their performance and facilitating the discovery of new insights from data.

The two most important performance metrics of a data structure are the time efficiency of its operations (such as queries and updates) and its space occupancy. Indeed, the less efficient a data structure’s operations are, the slower the overall performance of the system or pipeline using it. The more space a data structure takes, the higher the costs to buy and operate memory and storage devices.

In the last few years, research has witnessed an upsurge of data structures that integrate machine learning models in their design to improve the two performance metrics mentioned above, resulting in the so-called learned data structures. A related trend has emerged in the study of algorithms, particularly online algorithms, whereby incorporating the predictions of a machine learning model in the form of advice enables “guiding” the algorithm towards a solution with greater quality, resulting in the so-called learning-augmented algorithms. Clearly, machine learning models can make errors for some inputs, with the possible consequence of misguiding an algorithm or data structure that harnesses them; thus, a crucial aspect of this research is to devise ways to deal with these errors and retain the worst-case guarantees.

A simple example of this learning-based approach can be given for the classic problem of searching for an element x in a collection C of elements. Here, a machine learning model could be trained to predict the location of x within C, thereby allowing us to narrow down the search for x to that location or a neighbourhood of it. Yet, the predicted location could be quite far off from the true one, thus making this search quite slow. Not to mention the time spent on computing the prediction, which is notoriously long for some machine learning models.

This example brings us to the second crucial aspect of this research: for the algorithm or data structure at hand to be truly effective and improve upon a traditional (non-learned) one, we must carefully balance the complexity of the model, the errors it makes, and the time it takes to compute a prediction. Considering these interrelated factors, it comes as no surprise that, for several algorithmic problems, simple models like piecewise linear functions have been preferred.

Furthermore, whether the learned approach is beneficial depends crucially on the data itself, which can be easy to learn with a basic model, or so hard to require a model that is too large and slow to be even remotely competitive with a traditional approach. This is part of the reason we do not yet have a clear understanding of the power and limitations of learned data structures, beyond a few theoretical results under some assumptions on the data or some experimental evidence on real datasets. 

A brick building with trees and a walkway

Description automatically generated

This summer, thanks to the TNA program by SoBigData.it, I had the opportunity for one month to do research on these aspects at the Division of Theoretical Computer Science of the KTH Royal Institute of Technology together with Asst. Prof. Ioana-Oriana Bercea. The collaboration was very stimulating, allowing me to gain new insights and explore new approaches to this research that I had not previously considered, and it also led to new interesting questions beyond what we originally planned during the TNA application step.

I loved my time in Stockholm, both at KTH, which provided an inspiring environment for doing research, and in the free time I spent exploring the city’s rich culture, beautiful landscapes, museums, and vibrant local life.

I want to thank Ioana for hosting me in KTH and for the enjoyable collaboration. I am also grateful to all the people at KTH whom I met during the TNA visit for making my time there so enriching. Finally, a huge thanks to SoBigData.it and its team for this opportunity and the support they provided.