ILE calculated using initial/estimated rather than true reward #41

maxmdaniel · 2018-10-14T21:25:12Z

In the inverse_learning_error module:

An attribute true_reward is passed and set at initialization of the ILE class. However, this attribute is never used.
Instead ILE's evaluate method uses the environment passed at initialization when calling ValueIteration. If this environment is a RewardWrapper, the returned values will be based on the wrapped rather than the true reward (via get_reward_matrix).

The upshot is that when whenever ILE is initialized with a RewardWrapper as first positional argument, then we'll get results that aren't based on the true reward. Iirc in the experiment script that produced our results ILE was in fact initialized with a RewardWrapper whose reward_function was constant zero.

I think we need a more general solution to keep track of when we're using the true vs. a custom reward function. Else I worry that we'll see many similar bugs. Our current way of handling this seems too intransparent; e.g. for the above analysis I had to look at three different functions in three different modules, and in the end the crucial part happened in a complicated nested for loop. (I'm still only 80% confident that I identified the bug correctly.)

The text was updated successfully, but these errors were encountered:

JohannesHeidecke · 2018-10-31T17:57:26Z

@dit7ya will fix this and push to the cleanup branch

JohannesHeidecke assigned dit7ya Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILE calculated using initial/estimated rather than true reward #41

ILE calculated using initial/estimated rather than true reward #41

maxmdaniel commented Oct 14, 2018 •

edited

Loading

JohannesHeidecke commented Oct 31, 2018

ILE calculated using initial/estimated rather than true reward #41

ILE calculated using initial/estimated rather than true reward #41

Comments

maxmdaniel commented Oct 14, 2018 • edited Loading

JohannesHeidecke commented Oct 31, 2018

maxmdaniel commented Oct 14, 2018 •

edited

Loading