Index
Reinforcement Learning
Reinforcement Learning を用いた LLM の改善.
Reinforcement Learning from Human Feedback / RLHF
人間によるフィードバックに基づいて、LLM を強化学習させる手法.
- Reinforcement Learning from Human Feedback / RLHF
アルゴリズム
Interactive Textual Environment/ BabyAI-Text / 2023
- Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
- [2023]
- arxiv.org
Directional Stimulus Prompting / DSP / 2023
- Directional Stimulus Prompting / DSP
ライブラリ
TRL
研究
Reward Design with Language Models
- Reward Design with Language Models
- [2023]
- arxiv.org
- github.com