PPO Training Algorithm and Training Result #5

zxcvbnjs · 2024-03-17T14:05:01Z

Dear Author,

I try to implement the PPO algorithm to replace the random policy in the given example but I find that the predator or prey only learns to go along a straight line rather than take a more flexible action. I might go to some wrong stages but I do not know how to fix the error and get some similar experimental results like your uploaded paper. So I want to ask whether you can release more experiment details and show how to implement the prey and predator algorithm to train the agent together or separately.

I sincerely appreciate your help and reply if it is possible!

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO Training Algorithm and Training Result #5

PPO Training Algorithm and Training Result #5

zxcvbnjs commented Mar 17, 2024

PPO Training Algorithm and Training Result #5

PPO Training Algorithm and Training Result #5

Comments

zxcvbnjs commented Mar 17, 2024