Beyond Pathway Boundaries: A Degree-Aware Network Clustering Test for Gene Sets
Beyond Pathway Boundaries: A Degree-Aware Network Clustering Test for Gene Sets
Queme, B.; Marjoram, P.; Mi, H.
AbstractOver-representation analysis (ORA) is the most commonly used interpretation tool for gene lists despite well-documented limitations: pathway boundaries are fixed, genes are assumed independent, and results depend on the background set. Network-based methods address these using interaction-network modularity, but introduce hub bias: highly connected genes appear clustered under naive nulls because curated networks overrepresent well-studied genes. Existing corrections are imperfect: edge permutation destroys the topology the test should condition on, and propagation methods hide the confound in parameter tuning. We introduce MANGO (Moran's Autocorrelation for Network Gene Over-representation), which asks one conditional question: does a gene set's spatial autocorrelation on a fixed biological network exceed what its degree composition alone would predict? MANGO computes Global Moran's I under a null that conditions on both the network and the binned degree distribution of the gene set, then decomposes significant signals at the component and gene level. In benchmarks, uniform nulls produce a false positive rate of 1.0 on hub-enriched gene sets with no real clustering; ten-bin degree-stratified nulls bring that to 0.0 with no power loss (AUC [≥] 0.98; on degree-typical signals, |{Delta}AUC| [≤] 0.004). Pathway-spiking simulations confirm detection of real biological clustering across diverse pathway sizes and degree profiles. Applied to the FIGI colorectal cancer GWAS (204 SNPs), the set is degree-typical (KS p = 0.83), yet Moran's I is highly significant (p < 0.001). Component-level jackknife localizes the entire signal to a single 24-gene module spanning TGF-{beta}, Wnt/cadherin, and related pathways, with four bottlenecks (SMAD3, MYC, CTNNB1, PTPN1) matching established CRC driver biology.