[Paper] Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning (HICRA)

최대 1 분 소요

This is a brief review for “Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning (HICRA)”.
You can see the paper at this link.

Overview

This work argues that RL improves LLM reasoning through an emergent two‑phase hierarchy: early training fixes low‑level procedural tokens, then later gains come from high‑level strategic planning. Based on this, the authors introduce HICRA, which increases credit assignment on planning tokens (rather than all tokens as in GRPO), leading to stronger performance.

Key Ideas

Diagnoses ‘aha moments’ and length scaling as signs of an emergent planning hierarchy.
Hierarchy‑Aware Credit Assignment (HICRA) amplifies gradients on planning tokens.
Outperforms GRPO‑style baselines by targeting the true bottleneck—strategy.

Why it matters

Clarifies why RL helps reasoning and offers a targeted algorithm (HICRA) that can be plugged into many RL setups for further gains.

References

Twitter Facebook LinkedIn

Yejin Kim

[Paper] Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning (HICRA)

Overview

Key Ideas

Why it matters

References

공유하기

댓글남기기

참고

[Paper] REFRAG: Rethinking RAG‑based Decoding

[Survey] Recent technical reports

[Survey] Recent approaches about Super alignments

[Survey] Recent approaches about Efficient ML