You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now to perform MBAR we have to compute energies over the entire system when the only portion of the system that actually changes is the lambda. This means building the full neighborlist and performing more work. We would like to compute only the energy related to the ligand (or any arbitrary subset) atoms. This will allow for a much faster evaluation of energies and make MBAR (as well as other tasks) cheaper.
Implementation
Modify the neighborlist code to be non-sequential, creating tiles as before but using a mapping array. This will allow us to reuse the non-bonded code otherwise as normally used, and still get performance improvements.
The text was updated successfully, but these errors were encountered:
A bit more context for this (but since this comment expands scope, perhaps I should move it to a new issue?).
I think we would like to profile and improve performance for a few common workloads:
Loops
u_k = [u(x, lam, params) for lam in lams], where lams are a collection of ~10-100 lambda values
u_n = [u(x, lam, params) for x in xs] , where xs are a collection of ~100-10000 snapshots
u_n = [u(x, lam, params) for x in few_particle_diff_xs] , where few_particle_diff_xs are a collection of ~100-10000 snapshots that are identical except for the positions of a small number of particles
Nested loops
4. u_nk = [[u(x, lam, params) for lam in lams] for x in xs]
Currently, performing this sort of loop using Python calls to timemachine custom ops takes about ~10x longer than we might expect (taking ~2 milliseconds per iteration, when we might expect "light speed" to be closer to ~200 microseconds per iteration).
We also expect to be able to go much faster than "light speed" in cases where we only need to compute energy differences delta_u = u(x, lam, params) - u(x0, lam0, params0), in cases where only a very small fraction of the pairwise interactions contribute to the difference. In some cases, Jax's vmap transformation can generate reasonably fast batched code for delta_u and its derivatives, even from a pair-list based numpy implementation like the one added in #453 .
@proteneer points out that the fast CUDA code in timemachine to compute exclusions accepts a pair list in essentially the same format:
constint * __restrict__ exclusion_idxs, // [E, 2] pair-list of atoms to be excluded
It may be possible to go even faster than k_nonbonded_exclusions using an approach like the one @badisa described above.
(Also, probably worth commenting on whether we only want energies u, or if we would also like any derivatives of the energy, in each of these cases. For cases (1)-(2) we have immediate uses for the usual derivatives du_dx, du_dp, for (3) we have a speculative use for du_dx restricted to the particles with varying position, and for (4) I don't think we foresee needing du_dx or du_dp.)
Intent
Right now to perform MBAR we have to compute energies over the entire system when the only portion of the system that actually changes is the lambda. This means building the full neighborlist and performing more work. We would like to compute only the energy related to the ligand (or any arbitrary subset) atoms. This will allow for a much faster evaluation of energies and make MBAR (as well as other tasks) cheaper.
Implementation
Modify the neighborlist code to be non-sequential, creating tiles as before but using a mapping array. This will allow us to reuse the non-bonded code otherwise as normally used, and still get performance improvements.
The text was updated successfully, but these errors were encountered: