Science Cast

Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling

librarianMay 7, 2026 3:56pm

Views (9)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling

bioRxivPDFMay 7, 2026 12:00am

Authors

Calvanese, F.; Lombardi, G.; Weigt, M.; FERNANDEZ-DE-COSSIO-DIAZ, J.

Abstract

Protein language models (pLMs) leverage large-scale evolutionary data to generate novel sequences, but steering generation toward desired physicochemical properties without sacrificing diversity remains a major challenge. Existing approaches often induce severe diversity loss or require computationally expensive retraining. We introduce Iterative Lookback Monte Carlo (ILMC), a training-free inference-time sampling strategy that interleaves autoregressive elongation with Metropolis--Hastings refinement to approximate sampling from a maximum-entropy target distribution balancing generative quality and steering objectives. We show theoretically that this target distribution is entropy-maximizing under fixed generative quality and steering constraints, and empirically that ILMC produces more diverse samples than standard autoregressive baselines at matched generative quality. Using simple steering potentials, ILMC improves desired molecular properties, including generating proteins with up to 12 higher predicted melting temperature than compute-matched alternative strategies. ILMC naturally applies to classifier-guided steering, where it outperforms purely autoregressive guidance in diversity while maintaining comparable enrichment of target properties. We validate ILMC on family-specific pLMs and on the multi-family model ProGen3.

TwitterandLinkedIn

0 comments

Add comment

Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling

Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments