Goal Representations for Instruction Following

Goal Representations for Instruction Following <!-- Figure title. Figure caption. This image is centered and set to 50% page width. --> A longstanding goal of the field of robot learning…

Rethinking the Role of PPO in RLHF

Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning…