Science Cast

GraphPop: graph-native computation decouples population genomics complexity from sample count

Jian-Feng MaoApril 15, 2026 3:56am

Views (3)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

GraphPop: graph-native computation decouples population genomics complexity from sample count

bioRxivPDFApril 14, 2026 12:00am

Authors

Estaji, E.; Zhao, S.-W.; Chen, Z.-Y.; Nie, S.; Mao, J.-F.

Abstract

Matrix-based population genomics tools scale as O(V x N), re-reading the full genotype matrix for every analysis. Here we present GraphPop, a graph database engine that reduces summary statistic complexity to O(V x K) where K is population count, independent of sample count, by computing on pre-aggregated allele-count arrays stored as graph node properties. The same architecture enables annotation-conditioned queries via edge traversal, persistent analytical records, and multi-statistic composition. Applied to rice 3K (29.6M SNPs, 3,024 accessions) and human 1000 Genomes (3,202 samples, 22 autosomes), GraphPop reveals that all 12 rice subpopulations show piN/piS > 1.0, uncovers opposite consequence-level Fst regimes between species, and identifies KCNE1 as a candidate pre-Out-of-Africa sweep via convergence of five stored statistics. GraphPop achieves 146-327x query-time speedup for pre-aggregated statistics and 63-179x for bit-packed haplotype computation, at constant ~160 MB memory. This complexity reduction makes systematic, annotation-integrated population genomics practical for the crop, livestock, conservation, and ecological datasets that constitute the majority of the field.

TwitterandLinkedIn

0 comments

Add comment

GraphPop: graph-native computation decouples population genomics complexity from sample count

GraphPop: graph-native computation decouples population genomics complexity from sample count

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments