Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 11 days ago • 21
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published Apr 7 • 122
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 295
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 508
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published Jan 13 • 53
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Paper • 2509.25758 • Published Sep 30, 2025 • 25
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published Jun 17, 2025 • 44
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13, 2025 • 30
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper • 2503.09662 • Published Mar 12, 2025 • 33
CompAct: Compressing Retrieved Documents Actively for Question Answering Paper • 2407.09014 • Published Jul 12, 2024 • 1
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6, 2025 • 22
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published Feb 20, 2025 • 26
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models Paper • 2401.15269 • Published Jan 27, 2024 • 2
System Message Generation for User Preferences using Open-Source Models Paper • 2502.11330 • Published Feb 17, 2025 • 15
Monet: Mixture of Monosemantic Experts for Transformers Paper • 2412.04139 • Published Dec 5, 2024 • 13