Yejin Kim

Study it and apply all!
Like to see error code because it is much better than warrning!

[KMOOC 강화학습] Week 03-3 Knapsack Problem

최대 1 분 소요

해당 강의는 K-MOOC의 “강화학습의 수학적 기초와 알고리즘 이해” 수업을 수강하며 기록한 내용입니다. 강의는 링크에서 확인하실 수 있습니다.

개괄

동적계획법

최단경로 문제(shortest path problem)
외판원 문제(Traveling Salesman Probelm, TSP)
배낭문제(knapsack problem)

배낭문제란?

배낭에 물건을 챙길 때 물건의 가치와 무게를 동시에 고려해서 가방을 담아야 하는 문제로, 무게의 최대치를 넘지 않으면서 가치의 합이 최대가 되는 것을 찾는다!

EX.

예산이 제한적일 때 프로젝트를 실행한다면

Problem Formulation

상태, 행동, 가치함수를 결정
- 상태 : $n$ : 첫 n개의 제품 / $x$ 가방의 무게 / $v$ 현재 무게가 $x$일 때 얻을 수 있는 물품의 최대치
- 행동 : 물건을 선택
- 가치 함수 : $v(n, x)$
- EX. $v(7,15)$ : 앞에 7개의 아이템이 있고 가방에 담을 수 있는 무게가 15일 때 담을 수 있는 가치의 최대치
- 여기서 단계별로 바뀌는건 $n$이다! 하나씩 더 넣게 되니!
Optimal substructure 조건을 만족하면서 재귀방정식을 도출하기!
- $n$번째 아이템을 넣지 않았을 때 : $v(n-1, x)$ 는 $n-1$개 중에서 x만큼 무거울 때 담을 수 있는 가치의 최대치
- n번째 물품의 가치를 $V_{n}$, $n$번째 아이템의 무게를 $w_{n}$라고 할 때
- $n$번째 아이템을 넣었을 때 : $V_{n} + v(n-1,x-w_{n})$
- 그 결과 : $v(n,x) = \max[v(n-1, x), V_{n}+v(n-1,x-w_{n})]$
- 즉 n번째 아이템을 넣지 말지 선택하는 걸로 재귀식을 구현한 것이다!

공유하기

Twitter Facebook LinkedIn

댓글남기기

참고

[Paper Review] TRPO and PPO

2 분 소요

🧠 Paper Review: TRPO & PPO

[Paper] Mamba: Linear-Time Sequence Modeling with Selective State Spaces

최대 1 분 소요

This is a brief review for “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”. You can see the paper at this paper link.

[SSM] Modeling Sequences with Structured State Spaces - Part I

10 분 소요

1.1 Deep sequence model Definition 1.1 (Informal). We use sequence model to refer to a parameterized map on sequences $y=f_\theta(x)$ where inputs and o...

[LLM-RL] Lecture 2: Value Functions

2 분 소요

Overview. This post focuses on Value Functions in reinforcement learning. We begin with a quick recap of the RL setup, then introduce state-value / action-va...