Skip to main content

SoBigData Articles

Building CD-LEAD: a benchmark and leaderboard for community detection

Community detection is one of the most studied problems in social network analysis. 

Together with Giulio Rossetti and other colleagues from CNR Pisa and KDD-lab, we have worked on this question, notably developing a python library (CDLIB), including more than 50 algorithms taken from the literature. 

In April 2024, I spent 3 weeks at Pisa, to collaborate on the creation of a new project related to CDLIB. Namely, the objective of this project is to develop a benchmarking infrastructure, allowing to run an compare all algorithms included in CDLIB on common benchmarks, and to offer an inline leaderboard, i.e., a website available online, on which any researcher can explore the results of running various methods on each of the benchmark. 

These three weeks have been very fruitful: through discussions, we converged on the experiments to run and the methods to include. We then developed the code of the leaderboard, and of the library allowing to run methods on the chosen experiments. We reached a first complete, functional prototype. 

 

Results examples

 

The originality of the work is that, beyond comparing more methods than any previous work, we test specific scenarios and network properties, such as the resolution limit, or the capacity to detected that some networks are random, and thus do not contain communities. Although some work remain, in particular running multiple times each experiment in order to have statistically significant results, we are confident to be able to release this new tool, the leaderboard website, and to publish at least an article about it in a few weeks of additional work. 

The idea of creating this benchmark and leaderboard wasn’t new, but only thanks to the existence of previous works such as CDlib and of the TNA access, we were able to make it happen.