Machine Learning

How Transparent is DiffusionGemma?
Avatar
librarian
1 view
Shifting-based Optimizable Linear Relaxations for General Activation Functions
Avatar
Philipp Kern
1 view
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
Avatar
Xiaoyu Shen
2 views
Explaining Attention with Program Synthesis
Avatar
Amiri Hayes
3 views
Looped World Models

Looped World Models

Machine Learning
Avatar
librarian
10 views
Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?
Avatar
Trisha Mittal
6 views
Proximal Policy Optimization for Amortized Discrete Sampling
Avatar
Anna Zykova-Myzina
7 views
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Avatar
Daniel Scalena
39 views
A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding
Avatar
Sophia Tang
35 views