Predicting cell type-specific coverage profiles from DNA sequence

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Predicting cell type-specific coverage profiles from DNA sequence

Authors

Linder, J.; Yuan, H.; Kelley, D. R.

Abstract

Predicting expression profiles from RNA-seq experiments provides a powerful approach for universal sequence-based variant effect prediction, enabling researchers to score variants that affect total gene expression and relative isoform abundances. These models can be repurposed for new prediction tasks through transfer learning. However, current base models train primarily on bulk RNA-seq profiles derived from tissues and cell lines, overlooking the wealth of single-cell 3'-seq data that captures cell type-specific gene regulation. Here, we extend the capabilities of our recently developed Borzoi model by training on single-cell 3'-seq expression profiles from the Tabula Sapiens, Tabula Muris, and the Adult Brain Atlas, aggregated by cell type. This new model, Borzoi Prime, enables accurate variant interpretation across diverse cells, spanning erythrocytes to microglia. Training on 3'-seq profiles improves the model's ability to predict cell- and tissue-specific alternative polyadenylation, even in the original bulk RNA-seq data. Through UTR-wide mutagenesis experiments of alternatively polyadenylated genes, we highlight determinants of cell type-specific 3' UTR regulation learned by the model. This cell type-resolved approach opens new possibilities for understanding genetic variant effects via multiple layers of regulation in specific cellular contexts.

Follow Us on

0 comments

Add comment