참고
[LLM-RL] Lecture 1: MDP, Objective, Value Functions, and Imitation Learning
3 분 소요
Overview. This post builds from the MDP framework to the RL objective and value functions, then contrasts pure RL with Imitation Learning (IL), focusing on B...
[Paper] Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning (HICRA)
최대 1 분 소요
This is a brief review for “Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning (HICRA)”. You can see the paper at this link.
[Paper] REFRAG: Rethinking RAG‑based Decoding
최대 1 분 소요
This is a brief review for “REFRAG: Rethinking RAG‑based Decoding”. You can see the paper at this link.
[Survey] Recent technical reports
최대 1 분 소요
This is a collection of recent technical reports from several vendors including Google DeepMind, x.AI, AllenAI, AI21Labs, Databricks and HyperCLOVA.
댓글남기기