Characterization of selective pressures acting on protein sites with Deep Learning
Characterization of selective pressures acting on protein sites with Deep Learning
Bergiron, E.; Nesterenko, L.; Barnier, J.; Veber, P.; Boussau, B.
AbstractIt is often useful, in the field of molecular evolution, to identify the selective pressures acting on a particular site of a protein to better understand its function. This is typically done with likelihood-based approaches applied to codon sequences in a phylogenetic context. However, these approaches are computationally costly. Here we adapt a linear transformer neural network architecture, which has been shown to be able to reconstruct accurate phylogenies from sequence alignments, to identify selective pressures acting on individual amino acid sites. We design different versions of the architecture and train and test them on simulations. We compare the results of one of our best models to the state-of-the-art approach codeml and find that it outperforms it when it is applied to data that resemble its training data, but that it performs less well when applied to data that does not resemble the training data. In all cases, our approach operates at a fraction of codeml\'s computational cost. These results suggest that such a neural network architecture, trained on realistic simulations, could compare very favorably to state-of-the-art approaches to characterize selection pressures acting on coding sequences.