Science Cast

HAVEN: Hierarchical Attention for Viral protEin-based host iNference

librarianJune 13, 2025 8:12pm

Views (12)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

HAVEN: Hierarchical Attention for Viral protEin-based host iNference

bioRxivPDFJune 13, 2025 12:00am

Authors

Antony, B.; Haghani, M.; Lauring, A. S.; Karpatne, A.; Murali, T. M.

Abstract

It is crucial to accurately predict hosts of viruses to understand and anticipate human infectious diseases that originate from animals. There is a lack of versatile models that handle out-of-distribution factors such as unseen hosts and viruses. We develop a machine learning model for predicting the host infected by a virus, given only the sequence of a protein encoded by the genome of that virus. Our approach, HAVEN, is the first to apply to multiple hosts and to generalize to unseen hosts and viruses. HAVEN is a transformer-based architecture coupled with hierarchical self-attention that can accept sequences of highly diverse lengths. We integrate HAVEN with a prototype-based few-shot learning classifier to predict rare classes. We demonstrate the accuracy, robustness, and generalizability of HAVEN through a comprehensive series of experiments. In particular, we show that HAVEN can achieve a median AUPRC of 0.67 while predicting common hosts. Moreover, HAVEN retains this AUPRC value even for rare hosts (median prevalence as low as 0.09%). Our model performs on par with state-of-the-art foundation models, which are 65 to 5, 000 times larger in size, and outperforms them in identifying hosts of SARS-CoV-2 variants of concern.

TwitterandLinkedIn

0 comments

Add comment

HAVEN: Hierarchical Attention for Viral protEin-based host iNference

HAVEN: Hierarchical Attention for Viral protEin-based host iNference

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments