Skip to main content

SoBigData Articles

Network Medicine: Local Community Detection on Inflammatory Bowel Disease

Exploratory: Network Medicine

Protein-protein interactions (PPIs) occur when two or more proteins bind together in a cell, in vitro or in a living organism as the interaction interface of proteins is evolved to a specific purpose, the interactions between proteins are connected to biological functions.

It is known that proteins involved in the same cellular processes often interact with each other. Therefore, the functions of uncharacterized proteins can be predicted through comparison with the interactions of similar known proteins, and the detection of pertinent communities in PPIs networks can be used to predict the function of uncharacterized proteins based on the functions of others they are grouped with.

Communities are groups of nodes (i.e. proteins) that are more connected to each other than to anything else in a network. Often these groups of nodes correspond to a common process, purpose, or function. Therefore, it is reasonable to hypothesize that determining communities on biological networks may shed new light on groupings of genes with common biological function or features. 

Community Detection Algorithm: We present a module for local graph partitioning using personalized Page Rank vectors. We develop a module that, starting from a graph G(V,E), finds local communities with small conductance and then merges them to find non overlapping communities. The module is divided in the following steps:

  1. Local Community Detection Step:Starting from an undirected Graph G and a set of seeds S,  It computes |S| overlapping local communities and returns the set of nodes that belong to the communities.

  2. Node Embedding Step: Each node returned at previous steps is encoded in a vector v of size |S| in which the i-th component of v is 1 if v is the i-th community, 0 otherwise. Then, each vector is considered as a row of the embedded matrix U.

  3. Community Detection Step: The embedded matrix U is used as input of SVD  and then of K-Means++ heuristics to find non overlapping communities.

  4.  Community Ranking Step: Each non-overlapping community c is then scored according on a set of nodes belonging to a Prior. Furthermore, very small communities (i.e. size of community  < 4 ) are filtered out from the final ranked list.   

Inflammatory Bowel Disease Community Validation:

We first executed the community detection algorithm and we returned communities that have at least a 25% of genes related to Inflammatory Bowel Disease. Thus, we downloaded IBD monogenic genes and GWAS genes from the gwas catalog and we use them as Prior to rank communities (community detection algorithm step 4).

We validate the quality of the top communities finding the top enriched pathways and comparing these pathways with the prior. To find the enriched pathways, we have used the python package gseapy for gene set enrichment analysis and we have chosen the drugs with a p-value lower than 1e-5. Figure 1. shows the pathways comparison between IBD monogenic gwas genes and candidate communities.

 

Figure 1. Biological validation of candidate communities

 

Written by: Leonardo Martini and Michele Gentili

 

References:

[1] Local Graph Partitioning using PageRank Vectors, Reid Andersen & Fan Chung, (2006)

[2] GSEA Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550) and Mootha, Lindgren, et al. (2003, Nat Genet 34, 267-273).

[3] Discovery of functional and disease pathways by community detection in PPI Networks, Stephen J. Wilson, Angela D. Wilkins