上传于 2023-02-13 11:53 阅读:287 次 标签:人工智能  学术论文  强化学习  RLHF   评论

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

Julia Kreutzer1 and Joshua Uyheng3  and Stefan Riezler1;2

1Computational Linguistics & 2IWR, Heidelberg University, Germany

文档评论

您不能发表评论,可能是以下原因
登录后才能评论