MarkerMatch: A Proximity-Based Probe-Matching Algorithm for Joint Analysis of Copy-Number Variants from Different Genotyping Arrays
MarkerMatch: A Proximity-Based Probe-Matching Algorithm for Joint Analysis of Copy-Number Variants from Different Genotyping Arrays
Ivankovic, F.; Yu, D.; Shen, J.; Zhan, L.; Niarchou, M.; Kaylor, A.; Domenech, L.; Miller-Fleming, T. W.; Porras, L. M.; Giusti-Rodriguez, P.; Ophoff, R. A.; Scharf, J. M.; Mathews, C. A.
AbstractCopy-number variants (CNVs) are a form of genetic structural variation with increasing importance in complex human disorders. Both DNA sequencing and microarray data can be used to call CNVs, which can be used in association tests, such as association between CNV number and disease status. Unlike genotypes, CNV detection in microarrays requires the use of observed intensity signals at each probe, which limits the imputability for analyses that span multiple array types. Thus far, a consensus set of probes (the intersection encompassing the probes that occur in common on all arrays) has been used to circumvent the problem of differing array-specific sensitivities. This has, however, led to excessive reduction in overall sensitivity of CNV calls as arrays can have an undesirably low overlap of probe sets. To overcome this limitation, we developed MarkerMatch, a proximity-based algorithm that matches probes across different genotyping microarrays to maximize the number of probes considered in the CNV calling algorithm, thereby increasing the resolution and sensitivity while preserving precision. By analyzing CNV calls from 4,906 individuals genotyped across three different arrays (Global Screening Array, Omni2.5 array, and Omni Express Exome array), we show that the MarkerMatch approach improves sensitivity by increasing the density of probes available for CNV calling while maintaining precision or improving it relative to the current practice (e.g., use of consensus probes only). We further demonstrate that MarkerMatch exceeds the output from current practice in terms of F1 score, Fowlkes-Mallows index, and Jaccard index. We also optimize MarkerMatch parameters, DMAX and Method, and find an optimal DMAX setting at 10kb, with no clear optimal candidate based on Method, indicating that parameters for this metric should be determined on a use case basis.