MaxEnt estimates the initial state distribution rather than getting it from the env #38

maxmdaniel · 2018-10-12T23:45:52Z

MaxEnt IRL currently estimates the initial state distribution based on the provided expert trajectories:
for traj in self.expert_trajs:
mu[traj['states'][0], 0] += 1
mu[:, 0] = mu[:, 0] / len(self.expert_trajs)

Is there a way to instead get the exact initial state distribution from the gym.Env, similar to the way we extract the exact transition dynamics?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaxEnt estimates the initial state distribution rather than getting it from the env #38

MaxEnt estimates the initial state distribution rather than getting it from the env #38

maxmdaniel commented Oct 12, 2018

MaxEnt estimates the initial state distribution rather than getting it from the env #38

MaxEnt estimates the initial state distribution rather than getting it from the env #38

Comments

maxmdaniel commented Oct 12, 2018