Optimizing genomic sampling for demographic and epidemiological inference with Markov decision processes
Optimizing genomic sampling for demographic and epidemiological inference with Markov decision processes
Rasmussen, D. A.; Bursell, M. G.; Burkhart, F.
AbstractInferences from population genomic data provide valuable insights into the demographic history of a population. Likewise, in genomic epidemiology, pathogen genomic data provide key insights into epidemic dynamics and potential sources of transmission. Yet predicting what information will be gained from genomic data about variables of interest and how different sampling strategies will impact the quality of downstream inferences remains challenging. As a result, population genomics largely lacks theory to guide decisions on how best to sample individuals for genomic sequencing. By adopting a sequential decision making framework, we show how Markov decision processes (MDPs) can be applied to jointly model a populations dynamics along with the sampling process. Critically, these MDPs allow us to compute the expected long-term value of sampling in terms of information gained about estimated variables. This in turn allows us to very efficiently explore and identify optimal sampling strategies. To illustrate our framework, we develop MDPs for three common demographic and epidemiological inference problems: estimating population growth rates, minimizing the transmission distance between sampled individuals and estimating migration rates between subpopulations. In each case, the MDP allows us to identify optimal sampling strategies that maximize the information gained from genomic data while minimizing costs.