Hereβs a 9-Hour Crash Course on RLHF (Reinforcement Learning from Human Feedback).
A comprehensive guide for post-alignment of Large Language Models (LLMs).
All resources are free.
Part 1: Introduction to RLHF, Tutorials and Overviews - (2h 0m)
Gain foundational knowledge in Reinforcement Learning, human feedback mechanisms, reward modeling, policy optimization, and practical AI applications.
Resources: - π 20m: Introduction to RLHF by Chip Huyen Read here - πΊ 40m: RLHF: From Zero to ChatGPT by Nathan Lambert Watch here - πΊ 60m: Lecture on RLHF by Hyung Won Chung Watch here
Part 2: Research - (1h 30m)
Dive into key papers that discuss RLHF implementations and the underlying theories.
- π 30m: InstructGPT Paper β Applying RLHF to a general language model Read here
- π 30m: DPO Paper by Rafael Rafailov Read here
- π 30m: Artificial Intelligence, Values, and Alignment β An essential paper Read here
Part 3: Python Codes and Implementations - (5h 0m)
Explore hands-on coding projects and practical implementations.
- π 2h: Detoxifying a Language Model using PPO Read here
- π 2h: RLHF with DPO & Hugging Face Read here
- π 1h: TRL Library for Fast Implementation β Minimal and efficient Read here
Part 4: Online Course - 1 Hours
Engage in a structured learning experience for deeper insights.
- π 1h: Reinforcement Learning from Human Feedback by Nikita Namjoshi Course link
Feel free to go through these sections at your own pace to gain comprehensive knowledge in RLHF.