Artificial Intelligence

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
Avatar
librarian
2 views
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
Avatar
Benjamin Feuer
2 views
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
Avatar
librarian
4 views
Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions
Avatar
Bryan Hooi
6 views
Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows
Avatar
librarian
4 views
$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge
Avatar
librarian
2 views
In-Context Environments Induce Evaluation-Awareness in Language Models
Avatar
librarian
3 views
Phi-4-reasoning-vision-15B Technical Report
Avatar
librarian
1 view
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Avatar
librarian
4 views
Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals
Avatar
Patrick Gerard
6 views
Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
Avatar
librarian
4 views
OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents
Avatar
Yichao Feng
2 views
RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Avatar
Siwei Zhang
2 views
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Avatar
Hongliu CAO
2 views
Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning
Avatar
Artem Kolesnikov
14 views
Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning
Avatar
Justin Waugh
7 views
Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy
Avatar
Xuechao Yang
8 views
Conformal Policy Control

Conformal Policy Control

Artificial Intelligence
Avatar
librarian
6 views
Tool Verification for Test-Time Reinforcement Learning
Avatar
librarian
10 views
LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Avatar
librarian
27 views
The Trinity of Consistency as a Defining Principle for General World Models
Avatar
librarian
18 views
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Avatar
Usman Anwar
16 views
A Model-Free Universal AI

A Model-Free Universal AI

Artificial Intelligence
Avatar
librarian
17 views
ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices
Avatar
librarian
22 views
Semantic Partial Grounding via LLMs
Avatar
librarian
20 views
Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence
Avatar
librarian
30 views
A Benchmark for Deep Information Synthesis
Avatar
librarian
20 views
Aletheia tackles FirstProof autonomously
Avatar
librarian
70 views
Agents of Chaos

Agents of Chaos

Artificial Intelligence
Avatar
librarian
75 views
CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching
Avatar
Yuzhe Wang
37 views
ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models
Avatar
librarian
32 views
Recurrent Structural Policy Gradient for Partially Observable Mean Field Games
Avatar
Clarisse Wibault
30 views