Concordance and dissonance: A genome-wide analysis of self-declared versus inferred ancestry in 10,250 participants from the HostSeq cohort
Concordance and dissonance: A genome-wide analysis of self-declared versus inferred ancestry in 10,250 participants from the HostSeq cohort
Warren, R. L.; Birol, I.
AbstractAccurately characterizing human diversity is foundational to equitable genomics. In this study, we present a large-scale analysis comparing self-declared ancestry with genetically inferred ancestry in 10,250 participants from the pan-Canadian HostSeq cohort. Using the ntRoot algorithm on whole genome sequencing data, we inferred both global and local continental-level ancestry and assessed concordance with self-identified sociocultural categories. High agreement was observed among individuals self-identifying as White (concordance rate=98.8%), Black (97.2%), East Asian (96.1%), and South Asian (89.9%), while substantial discordance was found in those self-identifying as Hispanic (concordance rate=74.6%), Middle Eastern / Central Asian (67.9%) or Indigenous (40.7%). We quantified agreement using Cohens kappa ({kappa} = -0.01 unweighted; 0.35 weighted) and assessed admixture complexity with Shannon entropy, revealing a strong relationship between discordance and ancestry heterogeneity. Principal component analysis further revealed that tightly clustered genetic profiles often corresponded with lower admixture complexity, whereas broader, overlapping distributions were observed in groups with more heterogeneous ancestry and complex sociocultural histories. These findings underscore the complex interplay between sociocultural identity and genomic data, with discordance patterns reflecting the historical and cultural complexity of human populations. By quantifying this relationship systematically with ntRoot, our approach provides a framework for moving rigid categorical labels toward more nuanced genome-derived ancestry characterization that can improve both scientific rigor and representational equity in genomics.