-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle datasets with no terminating states #394
Comments
Disregard my last comment, I misunderstood. This is tricky because you need terminating states to have a reasonable model. I think in this case you need another model that predicts total value remaining for a user and use this second model to augment the data so ReAgent gets complete episodes. The second model could be a guess based on looking at prior users that have abandoned. |
I have the same question and a proposal to workaround the problem: Let's imagine that trajectories (episodes) are infinite in general but the information about steps occurs incrementally ( user interacts with the system and new data is generated). So we want to update agent's policy incrementally as well to make better actions in the future. Imagine we received new data with three new steps in particular trajectory. In this case, will it be correct to update the policy (DQN in particular) using these steps as an episode but not using the third (i.e. terminal one) step's q-values, where the Q function is calculated incorrectly because the last step is, in fact, not the terminal ? in more details, for this example we will have Proposal: Does it seem to be correct training flow? |
In this case, there will be a high error in the model when you deploy it. The model will think that all states have a lot of future reward because it hasn't seen one that terminates. The Q values will be really high. |
Imagine that we use ReAgent to train a personalization policy, and the workflow is as follows:
The question is how to do this correctly to handle the initial conditions - in DQN, it is assumed that Q(s,a)=0 for the final states of the episodes, but we extend the episodes with new transitions at each update. Does ReAgent handle this correctly?
The text was updated successfully, but these errors were encountered: