Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
Julia Kreutzer1 and Joshua Uyheng3 and Stefan Riezler1;2
1Computational Linguistics & 2IWR, Heidelberg University, Germany
更多
fkreutzer,riezlerg@cl.uni-heidelberg.de
3Departments of Psychology & Mathematics, Ateneo de Manila University, Philippines
juyheng@ateneo.edu
收起
文档评论