Artificial Intelligence

SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle
  Generation from SAT Formulas
Avatar
Anjiang Wei
0 views
Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning
Avatar
librarian
0 views
Agent Context Protocols Enhance Collective Inference
Avatar
librarian
0 views
Debating for Better Reasoning: An Unsupervised Multimodal Approach
Avatar
librarian
0 views
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory
  Perceptions
Avatar
Bufang Yang
0 views
Two Experts Are All You Need for Steering Thinking: Reinforcing
  Cognitive Effort in MoE Reasoning Models Without Additional Training
Avatar
Zhaopeng Tu
0 views
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable
  Step-Level Supervision
Avatar
librarian
0 views
Trust, But Verify: A Self-Verification Approach to Reinforcement
  Learning with Verifiable Rewards
Avatar
librarian
1 view
Empirically evaluating commonsense intelligence in large language models
  with large-scale human judgments
Avatar
librarian
7 views
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Avatar
Luics Xu
24 views
Towards a Deeper Understanding of Reasoning Capabilities in Large
  Language Models
Avatar
librarian
14 views
Plasticity as the Mirror of Empowerment
Avatar
librarian
9 views
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and
  Challenge
Avatar
Ranjan Sapkota
11 views
\textsc{rfPG}: Robust Finite-Memory Policy Gradients for Hidden-Model
  POMDPs
Avatar
librarian
23 views
The Influence of Human-inspired Agentic Sophistication in LLM-driven
  Strategic Reasoners
Avatar
librarian
7 views
Reproducibility Study of "Cooperate or Collapse: Emergence of
  Sustainable Cooperation in a Society of LLM Agents"
Avatar
librarian
4 views
Counterfactual Strategies for Markov Decision Processes
Avatar
librarian
6 views
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help
  Them Think Like Scientists?
Avatar
Anthony GX-Chen
15 views
WixQA: A Multi-Dataset Benchmark for Enterprise Retrieval-Augmented
  Generation
Avatar
librarian
9 views
TRAIL: Trace Reasoning and Agentic Issue Localization
Avatar
librarian
15 views
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of
  Large Language Models
Avatar
librarian
15 views
ARC-NCA: Towards Developmental Solutions to the Abstraction and
  Reasoning Corpus
Avatar
Stefano Nichele
10 views
Belief Injection for Epistemic Control in Linguistic State Space
Avatar
librarian
8 views
AI for Extreme Event Modeling and Understanding: Methodologies and
  Challenges
Avatar
Aytaç PAÇAL
14 views
"I Apologize For Not Understanding Your Policy": Exploring the
  Specification and Evaluation of User-Managed Access Control Policies by AI
  Virtual Assistants
Avatar
Jennifer Mondragon
10 views
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large
  Language Models
Avatar
Lei Wang
10 views
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Avatar
Mathus Dai
9 views
Emotion-Gradient Metacognitive RSI (Part I): Theoretical Foundations and
  Single-Agent Architecture
Avatar
Rintaro Ando
10 views
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for
  Mathematical Problem Solving
Avatar
librarian
12 views
A Pain Assessment Framework based on multimodal data and Deep Machine
  Learning methods
Avatar
librarian
19 views
Is there a half-life for the success rates of AI agents?
Avatar
librarian
10 views
MARK: Memory Augmented Refinement of Knowledge
Avatar
Anish Ganguli
12 views