You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to implement the PPO algorithm to replace the random policy in the given example but I find that the predator or prey only learns to go along a straight line rather than take a more flexible action. I might go to some wrong stages but I do not know how to fix the error and get some similar experimental results like your uploaded paper. So I want to ask whether you can release more experiment details and show how to implement the prey and predator algorithm to train the agent together or separately.
I sincerely appreciate your help and reply if it is possible!
Thank you very much!
The text was updated successfully, but these errors were encountered:
Dear Author,
I try to implement the PPO algorithm to replace the random policy in the given example but I find that the predator or prey only learns to go along a straight line rather than take a more flexible action. I might go to some wrong stages but I do not know how to fix the error and get some similar experimental results like your uploaded paper. So I want to ask whether you can release more experiment details and show how to implement the prey and predator algorithm to train the agent together or separately.
I sincerely appreciate your help and reply if it is possible!
Thank you very much!
The text was updated successfully, but these errors were encountered: