-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on relaxation vs alchemical steps #20
Comments
Those issues aren't really relevant if you are (1) using alchemical systems generated with one of our alchemical factories, and (2) disable the long-range dispersion calculation for the alchemically-modified atoms in the alchemical factory with the The issues @gregoryross noted were mainly due to the fact that he was instead (1) calling |
I'd love to hear about how you guys are doing with BLUES. I think a few weeks back we had asked if you had looked at the stddev of the protocol work as a function of the number of protocol steps, since this is the main way to examine what is going on with protocol efficiency, but I've never seen any plots. |
Specifically, quantifying this for hopping a ligand around in a box of solvent would be sufficient to diagnose any issues. @gregoryross also found that continuing to use a barostat during the NCMC protocol was important in getting good acceptance rates as well. Not sure if this applies to your case too. |
Ah, OK, so I misunderstood slightly. I thought that the thread was saying there was still a major speed difference even aside from the alchemical dispersion correction.
We've been working through figuring out how to get reasonable acceptance in the "new world" where we no longer use the buggy
I think I'd missed this suggestion, but I'm checking with Sam.
Why? This is very interesting -- we have not pursued that at all. Maybe we should look at it.
We've done a bit of that specific test now and can at least get acceptance; we also see the work distribution shift towards "less unfavorable" as we do more relaxation, but have not looked specifically at the standard deviation. |
One other thing I've been interested in is whether it would make sense to do more propagation near the "physical" states rather than at the alchemical intermediate states. At least for some simple problems (like rotating a ligand in-place in a cavity in a protein) I can convince myself this would make sense -- for example if you relax a great deal when it is non-interacting, then you may have the issue where the protein cavity collapses around the ligand, and then you have to push it back out of the way again as you turn the ligand back on. In that case, I can wave my hands and convince myself that relaxing more closer to the "physical" states and less towards the intermediate state might result in better acceptance. However, I'm not convinced it's worth exploring this much nor that it would really improve things much. |
This sounds much more in line with my experience and expectations.
There are likely ways to optimize the protocols to increase acceptance rates---potentially by orders of magnitude---but the first step is to get to the point where the work stddevs are less than a few kT. The relevant theory is explained here: http://threeplusone.com/Crooks2007c.pdf
We don't know for sure, but the use of a barostat did have a measurable effect. |
@jchodera Here's some histogram plots of the work distributions from NCMC using various relaxations steps for toluene in water, and toluene bound to T4 lysozyme. |
@sgill2: These plots look exactly how I would expect! Thanks! You should be able to take this data and plot the <P_accept> and its 95% confidence intervals as a function of the number of steps (perhaps on a log-log plot) so that you can see how the acceptance rates are changing with the number of steps. Your acceptance rates are still pretty low in lysozyme, but there will be ways we can increase those acceptance rates. There are a lot of ways we can increase the acceptance probabilities. Here are just a few ideas:
|
@jchodera Thanks! Those all sound like interesting suggestions. I'm excited to try some of them out and see how they affect the acceptance. |
I've ran the same toluene in water NCMC simulations (with random rotations) as above, but this time using the Toluene in water (with random rotations) The main difference is that the std deviation doesn't seem to decrease with increased relaxation as before. I'm in the process of running the toluene/lysozyme systems with a barostat, and the shorter relaxation runs show similar behavior to the toluene/water system with a barostat. We could rationalize the acceptance increase of the barostat in the toluene/water case (since volume changes could help the water respond to the insertion/deletion of toluene), but we can't reason why the acceptance for the lysozyme case would be affected. From what John said earlier about the work distributions, we're leaning towards thinking that this isn't intended behavior and might be due to a bug in the barostat or in our code. Do you have any thoughts on this @jchodera? |
This is actually really concerning, and I'm wondering if this indicates some sort of bug in our work-tracking scheme. Can you point to the code you're using for these tests? @gregoryross : Can you take a look at the above? I think it is essential to compute the acceptance rate vs number of steps (on a log-log plot) with and without the barostat for |
I'll try to get up a minimum working example up–hopefully later today–using only |
Here’s a minimal example script for toluene in water. In the zip, the |
@gregoryross -- just wanted to confirm you saw this? If not we can follow up with you by e-mail or Slack...? |
I spoke to @gregoryross today. He is going to check his code (which uses I'm working on running the Tagging @pgrinaway too since this issue will directly impact his current |
Hi @davidlmobley and @sgill2. Sorry for my late reply to all this, I've been preoccupied with sorting out my visa up to now. I'm now more able to return to this. The results you've found are pretty odd @sgill2, especially the fact that the variance of the protocol work doesn't appear to be decreasing that much. I've taken a look at some of my saltswap benchmarking simulations to see if my results make sense. As John mentioned, my code uses the These results are taken from simulations where there was 1 propagation step for every perturbation step. Each propagation step was 2fs long so that a protocol length of 20ps has 10,000 perturbation steps. The plot shows the variance for insertions and deletions. The variance decreases with more perturbation steps as we expect. Here's a plot of the how the acceptance rate changes with increasing the number of perturbations: As you can see, the acceptance rate increases with the number of perturbation steps. Just to be clear, the results above have been run with the barostat switched on. I've currently set-up some new simulations that repeat the above simulations without the barostat. Tomorrow I'll add some histograms of the work distributions so that we can have a clearer picture of what's going with my code. @sgill2, I'll also have a close look at your example script to see if I can spot anything. |
@gregoryross : Presumably by "variance" you mean "standard deviation"? |
@sgill2 : I'm looking at your example right now. When you reject, you neglect to reset the box volume: def acceptReject(context, original_pos, protocol_work):
log_randnum = math.log(np.random.random())
if -1*protocol_work > log_randnum:
#move accepted, don't need to do anything
pass
else:
#rejected so use previous positions
context.setPositions(original_pos) You will need to store the box vectors and restore these before |
@jchodera Opps, I left setting the box vectors out in the example script, but did set them in my runs in BLUES. For BLUES though, I set the volume with ( |
Yes, that could cause the coordinates to be incorrectly wrapped and your initial conditions for NCMC to be distorted. |
@jchodera, the plot does show the variance as a function of number of perturbations. However, the units on the y-axis are indeed wrong: it should read kT^2 instead of kT. Thanks for clearing that up. |
Thanks. In that case I'll rerun my barostat examples again with that fix and let you folks know how the work distributions look after they finish. |
Let's show the stddev instead when possible. |
Please do! We're happy to keep debugging if you're still seeing odd behavior at that point. |
@jchodera After making the changes to setting periodic boxes I still see the previously mentioned behavior with the barostat. Below are the stddev and acceptance plots for the toluene in water and lysozyme. |
OK, this is progress! I was hoping you could update the simple test demonstrations script so we can focus on working on that together---do you have the updated version here? |
Here are histograms for the protocol work distributions for removing salt from a box of water for different numbers of NCMC perturbations. These simulations were run with a barostat. (I'm still waiting for my constant volume simulations to run on the cluster.) These distributions exhibit the behaviour I would expect, with both the mean and variance decreasing for more NCMC perturbations. Clearly this doesn't match what you're seeing @sgill2. I had a look at your minimal script, and in addition to using a different NCMC driver to the one used in |
Some other observations:
|
Also, are you sure your protocol is symmetric? I haven't worked through this carefully: functions = { 'lambda_sterics' : 'min(1, (1/0.3)*abs(lambda-0.5))',
'lambda_electrostatics' : 'step(0.2-lambda) - 1/0.2*lambda*step(0.2-lambda) + 1/0.2*(lambda-0.8)*step(lambda-0.8)' } I'm still checking the bookkeeping to make sure we aren't doing something silly regarding barostats. |
@sgill2 : Which version of OpenMM are you using? python -c "from simtk import openmm; print(openmm.version.version)" |
@sgill2 : One thing to note is that, if you don't call |
I've made the recommended changes to the example script here: bar_ex.zip.
I'm using
I think so. Here are the plots of the equations. One detail I left out in my last post was that those graphs were from using BLUES, so I'll probably want to rerun the same cases using the minimal example script. If everything above looks fine and there's no other suggested changes I'll go ahead and rerun them. |
Can I get an update on the latest protocol recommendations (at least in the saltswap context) in terms of number of perturbation vs propagation kernels? As I understand it the issue is that perturbation remains slower than propagation, so there might be a sweet spot where there is still a lot of perturbation, but more propagation. We are exploring related issues for BLUES, and don't want to reinvent the wheel.
@sgill2 and I ran across a discussion of this in openmm/openmm#1832 (comment), where there was a graph of some (apparently older) data on how efficiency varied as a number of perturbation vs propagation kernels from @jchodera. Peter Eastman made the observation that it looked like what was important was the product of the two (though John noted that this was not quite the case). John mentioned that you guys would be exploring this more since perturbation is slower than propagation so there might be a sweet spot where one does more propagation than might otherwise be expected.
(cc @Limn1 )
The text was updated successfully, but these errors were encountered: