What’s Happening in My Field
Reasoning
Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
2026-03-10
ReasoningDC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning
2026-03-09
Self-ImprovementSAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
2026-03-06
RL TrainingWhen Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On
2026-03-05
Audio ReasoningA Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs
2026-03-04
Self-ImprovementThrough the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
2026-03-03
ReasoningPRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference
2026-03-03
Self-ImprovementProvable and Practical In-Context Policy Optimization for Self-Improvement
2026-03-02
ReasoningTruncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
2026-02-26