PromptBio-Bench: Benchmarking LLM-based Bioinformatics Agents for End-to-End Data Analysis

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

PromptBio-Bench: Benchmarking LLM-based Bioinformatics Agents for End-to-End Data Analysis

Authors

Guo, W.; Zhang, M.; Han, B.; Ma, Y.; Leng, Y.; Hebbar, S.; Zhou, X.; Gu, W.; Yang, X.; Dhar, S.

Abstract

Large language model (LLM)-based agents hold transformative potential for automating bioinformatics workflows; however, systematic evaluations of their capabilities remain limited, hindering a clear assessment of their readiness for real-world application. We introduce PromptBio-Bench, a comprehensive evaluation suite of 194 expert-curated tasks spanning bioinformatics and data science at varied difficulty levels, and an evaluation framework for structured file comparison and scoring against expert reference answers. Benchmarking three state-of-the-art agents revealed that Biomni and ToolsGenie achieved comparable performance, and accuracy declined markedly at higher difficulty levels across all agents. As foundation models and agent frameworks continue to evolve, PromptBio-Bench provides a valuable benchmark infrastructure for the community to systematically track the progress of agentic bioinformatics.

Follow Us on

0 comments

Add comment