Peyman Kor | <strong>?meta:title</strong>

Here’s a 9-Hour Crash Course on RLHF (Reinforcement Learning from Human Feedback).

A comprehensive guide for post-alignment of Large Language Models (LLMs).

All resources are free.

Part 1: Introduction to RLHF, Tutorials and Overviews - (2h 0m)

Gain foundational knowledge in Reinforcement Learning, human feedback mechanisms, reward modeling, policy optimization, and practical AI applications.

Resources: - 📚 20m: Introduction to RLHF by Chip Huyen Read here - 📺 40m: RLHF: From Zero to ChatGPT by Nathan Lambert Watch here - 📺 60m: Lecture on RLHF by Hyung Won Chung Watch here

Part 2: Research - (1h 30m)

Dive into key papers that discuss RLHF implementations and the underlying theories.

📚 30m: InstructGPT Paper – Applying RLHF to a general language model Read here
📚 30m: DPO Paper by Rafael Rafailov Read here
📚 30m: Artificial Intelligence, Values, and Alignment – An essential paper Read here

Part 3: Python Codes and Implementations - (5h 0m)

Explore hands-on coding projects and practical implementations.

📚 2h: Detoxifying a Language Model using PPO Read here
📚 2h: RLHF with DPO & Hugging Face Read here
📚 1h: TRL Library for Fast Implementation – Minimal and efficient Read here

Part 4: Online Course - 1 Hours

Engage in a structured learning experience for deeper insights.

📚 1h: Reinforcement Learning from Human Feedback by Nikita Namjoshi Course link

Feel free to go through these sections at your own pace to gain comprehensive knowledge in RLHF.