#rlhf

Can we really scale RL?

Yes and No. LLM reasoning research is just a big pile of math. We stir the math every once in a while, and it starts doing crazy stuff. For months the community has argued that RL post-training just polishes ideas an LLM already had. ProRL politely s...

Jun 8, 202511 min read279

Command Palette