You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The internal logic of this environment seems to have issues. I want to increase the number of predators but need to modify a large number of built-in parameters. After the modifications, the program can only run for a while and cannot run for a large number of episodes. Even when I try to reproduce the situation in your paper by using the default parameters or adjusting the number of prey to 8, I still encounter issues like "assert len(observations) == self.number_of_predator_observations" after running for a while. I have tried using algorithms like MADDPG, PPO, and MATD3, and they all exhibit the same problem. In other words, whenever the running steps per episode are longer, such issues inevitably arise, preventing me from adequately training the agents. Do you have any suggestions for resolving this issue? Is it possible to reproduce the scenario presented in your paper?
The text was updated successfully, but these errors were encountered:
The issue I was facing has already been resolved, and I am keeping the question open in the hope that it will be helpful to others. In your environmental source code, there is a mistake in the parenthesis placement in this line: "observations += [0] * self.obs_size * (n_nearest_shark - len(observations))". It should be corrected to "observations += [0] * (self.obs_size * n_nearest_shark - len(observations))". Moreover, according to the paper, the way prey and predators return observations should be the same, meaning their logic for observing the environment is identical. However, in the source code (which seems to have been written by different authors), the logic for predators observing the environment appears to be different from that of the prey, which could potentially lead to errors during prolonged training. Therefore, it is recommended to standardize their methods of observing the environment.
Thank you for bringing this to our attention. We are currently actively working on the v1.0 version where this will be fixed. After our experiments in the paper we did a major refactor of the whole environment which broke many things unfortunately.
The internal logic of this environment seems to have issues. I want to increase the number of predators but need to modify a large number of built-in parameters. After the modifications, the program can only run for a while and cannot run for a large number of episodes. Even when I try to reproduce the situation in your paper by using the default parameters or adjusting the number of prey to 8, I still encounter issues like "assert len(observations) == self.number_of_predator_observations" after running for a while. I have tried using algorithms like MADDPG, PPO, and MATD3, and they all exhibit the same problem. In other words, whenever the running steps per episode are longer, such issues inevitably arise, preventing me from adequately training the agents. Do you have any suggestions for resolving this issue? Is it possible to reproduce the scenario presented in your paper?
The text was updated successfully, but these errors were encountered: