- Korea Univ. (Graduate School) / Network Simulation / ECE645 / 2023 Fall
- Reinforcement Learning
- Lecture by Prof. Hwangnam Kim, School of Eletrical Engineering, Korea University
- The folders which start from 'assignment' is based on my assignment in this course.
- The folders which start from 'ch' is based on my Reference Book Reinforcement Learning using Stable Baselines, which was written by Prof. Yousung Park in Department of Statistics at Korea University.
Chapter | Contents | Details |
---|---|---|
1 | Introduction to Reinforcement Learning | What is Reinforcement Learning? |
2 | Introduction to ML, DL, and RL | ML vs DL |
Convolutional Network | ||
Recurrent Neural Network | ||
Reinforcement Learning | ||
3 | Mathematics for Reinforcement Learning | Random Process |
Markov Process | ||
Markov Reward Process & Markov Decision Process | ||
Optimization | ||
Gradient Descent Algorithms | ||
Optimization Algorithms for Training Deep Neural Networks | ||
Information Theory | ||
Parameter Estimation Concept | ||
4 | Reinforcement Learning Concept | Reinforcement Learning Concept |
Reinforcement Learning Components | ||
Long-Term Reward and Value Function | ||
5 | MDP and DP | Markov Decision Process |
Dynamic Programming | ||
Policy Evaluation | ||
Optimal Policies Revisited | ||
Finding Optimal Policies: Dynamic Programming | ||
6 | Model Free Algorithm | Model-Free RL |
Monte-Carlo Method Prediction and Control | ||
Monte-Carlo Policy Control | ||
Exploration More | ||
Temporal Difference for Prediction | ||
Temporal Differences Extended: N-Step Prediction | ||
On-Policy Control: SARSA | ||
Off-Policy Learning: Q-Learning | ||
Comparison: SARSA and Q-Learning | ||
Off-Policy Learning with Importance Sampling | ||
7 | Function Approximation | Function Approximation |
Incremental Methods | ||
Coarse Coding | ||
Prediction with Value Function Approximation | ||
Control with Value Function Approximation | ||
Batch Methods | ||
8 | Extension of Q-Learning | Key Variants and Extensions of Q-Learning |
Fitted Q-Learning | ||
Deep Q-Network | ||
Double Q-Learning | ||
Double DQN | ||
Prioritized Experience Replay | ||
Dueling Network Architectures | ||
N-Step Q-Learning | ||
Distributional vs. Distributed Q-Learning | ||
Noisy Nets | ||
Rainbow Q-Learning: Combining Improvements in Deep Reinforcement Learning | ||
Asynchronous Q-Learning | ||
Optimistic Q-Learning | ||
Faster Deep Reinforcement Learning by Optimality Tightening | ||
Practical Skills | ||
9 | Policy Based Algorithm | Policy Gradient |
Policy Optimization | ||
Policy Gradient | ||
A Structure for Reinforce and Actor-Critic | ||
Reinforce | ||
Actor Critic | ||
Summary | ||
10 | Model-Based Reinforcement Learning | Model-Based Reinforcement Learning |
Model-free and model-based approach: Integrated Architecture | ||
Simulation for Planning | ||
11 | Case Studies in Policy Based Algorithm | Policy Gradient Theorem Revisited |
A2C | ||
A3C | ||
PPO | ||
DDPG |
Chapter | Contents |
---|---|
1 | Value Iteration and Policy Iteration Coding using OpenAI Gym - FrozenLake |
2 | Monte Carlo Prediction and Control Coding |
3 | SARSA and Q-Learning Control Coding |
Chapter | Contents |
---|---|
1 | Introduction to Reinforcement Learning |
2 | Bellman Equation and Dynamic Programming |
3 | OpenAI Gym |
4 | Monte-Carlo Estimation |
5 | TD and Action |
6 | Deep Q Networks |
7 | Policy-based Reinforcement Learning |
8 | Actor-Critic Reinforcement Learning |
9 | Stable Baselines |
10 | TRPO, PPO, ACKTR |
11 | DDPG, TD3, SAC |
12 | Imitation Learning and Inverse Reinforcement Learning |
13 | Probability Distribution-based Reinforcement Learning |
Appendix | Reinforcement Learning Algorithm |
Number | Contents |
---|---|
1 | Monte-Carlo Policy Iteration |
2 | Off-Policy Monte-Carlo Algorithm |
3 | SARSA Algorithm |
4 | Q-Learning Algorithm |
5 | DQN Algorithm |
6 | REINFORCE Algorithm |
7 | Policy Gradient with Baseline Algorithm |
8 | A2C Algorithm |
9 | TRPO Algorithm |
10 | PPO-clipped Algorithm |
11 | PPO-penalty Algorithm |
12 | DDPG Algorithm |
13 | TD3 Algorithm |
14 | SAC Algorithm |
15 | DAgger Algorithm |
16 | DQfD Algorithm |
17 | IRL Algorithm |
18 | Categorical DQN Algorithm |
19 | D4PG Algorithm |