Posts by Tags

RLHF

The Exploration-Exploitation Dilemma in RLHF for Generative Models

9 minute read

Published: October 20, 2025

A deep dive into why fixed regularization in RLHF leads to diversity collapse, and how adaptive sample-level control resolves the exploration-exploitation dilemma. Read more

audio LLMs

Test-Time Inverse Scaling in Audio LLMs

10 minute read

Published: October 27, 2025

Chain-of-thought reasoning helps text LLMs but hurts Audio LLMs. This post explains why — and how process rewards fix it. Read more

flow matching

The Exploration-Exploitation Dilemma in RLHF for Generative Models

9 minute read

Published: October 20, 2025

A deep dive into why fixed regularization in RLHF leads to diversity collapse, and how adaptive sample-level control resolves the exploration-exploitation dilemma. Read more

generative models

The Exploration-Exploitation Dilemma in RLHF for Generative Models

9 minute read

Published: October 20, 2025

A deep dive into why fixed regularization in RLHF leads to diversity collapse, and how adaptive sample-level control resolves the exploration-exploitation dilemma. Read more

reasoning

Test-Time Inverse Scaling in Audio LLMs

10 minute read

Published: October 27, 2025

Chain-of-thought reasoning helps text LLMs but hurts Audio LLMs. This post explains why — and how process rewards fix it. Read more

reinforcement learning

Test-Time Inverse Scaling in Audio LLMs

10 minute read

Published: October 27, 2025

Chain-of-thought reasoning helps text LLMs but hurts Audio LLMs. This post explains why — and how process rewards fix it. Read more

The Exploration-Exploitation Dilemma in RLHF for Generative Models

9 minute read

Published: October 20, 2025

A deep dive into why fixed regularization in RLHF leads to diversity collapse, and how adaptive sample-level control resolves the exploration-exploitation dilemma. Read more

test-time scaling

Test-Time Inverse Scaling in Audio LLMs

10 minute read

Published: October 27, 2025

Chain-of-thought reasoning helps text LLMs but hurts Audio LLMs. This post explains why — and how process rewards fix it. Read more

Jiajun Fan

Posts by Tags

RLHF

audio LLMs

flow matching

generative models

reasoning

reinforcement learning

test-time scaling