-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New class to help design experiments based on assaytools MCMC data #80
base: master
Are you sure you want to change the base?
Conversation
…entratin in assays
…nt of variation and total error of future experiments as a function of protein concetration.
… concentrations. Cleaned up functions in AssaySimulator.
…ance, and bias for different experimental parameters. The jupyter notebook was updated to reflect this. Two superfluous files were removed.
…other is superfluous.
Tagging @sonyahanson, @MehtapIsik, and @jchodera. |
Thanks! Will take a look! |
Whoa, the CV of the binding free energy is 1000-9000%?! This sounds extremely horribly wrong. |
And the relative error is also in the binding free energy and varies 80-180%? This all sounds suspiciously wrong. Errors this high would mean that the approach has almost no value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through assay_simulator.py
.
Currently, it seems that the only source of uncertainty in your bootstrap samples is the initial guess for DeltaG, and that the least-squares optimization results in something like a CV of 9000%, which suggests to me that the least-squares fit is so unstable as to be unusable.
There may be some other unit errors or bugs, though I didn't spot any in my pass through.
It would probably be useful to plot what is going on with the simulated fluorescence and resulting least-squares fit to see why it is getting stuck so far from the solution.
We should really be sure to incorporate the sources of uncertainty we think are important into this model. I was hoping that you could use the same pymc model to simulate the sources of uncertainty rather than have to recode everything---I think there's a way to do this---but if we do implement everything from scratch as is done here, we would want to incorporate:
- protein concentration (something like ~10% uncertain)
- ligand dispensed concentrations (~8% per well)
- measurement noise and the noise floor (or background fluorescence), since this determines minimum detectable quantity of complex
if noisy: | ||
Fmodel += np.random.normal(loc=0.0, scale=self.sigma, size=len(self.l_total)) | ||
else: | ||
Fmodel = self.F_PL * PL + self.F_L * L_free + self.F_P * P_free + self.F_buffer * self.path_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing self.F_plate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, adding it in now.
""" | ||
The sum of squares between model fluorescence and the target | ||
""" | ||
model = self.simulate_fluorescence(DeltaG, p_total, noisy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be adding measurement noise here if we want to quantify the CV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the loss function that is minimized during the fitting, so the prediction (called model
here) must be a deterministic function of the parameters for gradient decent. The measurement noise is added a few lines above this:
target = self.simulate_fluorescence(p_total)
The target fluorescence has the noise added by default. As this is not clear, I'll change this to
target = self.simulate_fluorescence(p_total, noisy=True)
simulator = AssaySimulator(pymc_data=pymc_data, l_total=l_total, sample_index=ind, p_total=p_total[p], **kwargs) | ||
# Draw fluorescence data with different values of random noise | ||
for j in range(nsamples): | ||
fit = simulator.fit_deltaG() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since fit_deltaG
currently doesn't add any noise, won't this omit the uncertainty in the measurements?
return np.sum((model - target)**2) | ||
|
||
# Start the initial guess within about 10% of the "true" value | ||
guess = self.DeltaG + np.random.normal(loc=0, scale=0.1 * np.abs(self.DeltaG)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only source of uncertainty currently being added to your model---the initial starting guess for the least-squares fit---since noisy=False
in sum_of_squares
above. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described above, noise is added when the fluorescence data is simulated. https://github.com/choderalab/assaytools/blob/experiment_design/experimental_design/assay_simulator.py#L178-L179
As this isn't clear, I improve the documentation.
I'm wondering if there might be a way to replace Consider the following scheme:
We can then change the relevant design parameters---the stated protein concentration, the assay volume/area, and the ligand concentrations---and see how this affects things. Does that sound like it would be a viable approach? I haven't had a chance to play with this myself yet, but it seems like this kind of imputation is possible with pymc2. |
-improving doctrings and a few comments -removing extra factor of 100 from CV estimation -added background fluorescence to data simulation when inner_filter=False
Thanks a lot for going through this @jchodera. The CV predictions had an extra factor of 100 applied to them when plotting (I'd recently changed where I converted the fractions to percent). This has now been removed such that the minimum CV has a value of ~30%. With regards to some of your comments, currently the only source of noise comes from fluorescence noise parametrized by either I like your suggestion of using As for you suggestion about allowing the protein and ligand concentration to change for each fit, I can easily add that in now if you and @sonyahanson like? |
…tionality of AssaySimulator was slighlty changed so that instead of fitting a single binding free energy, multiple estimates of the free energy are returned instead.
merging master into experiment_design
Added additions that allow this to work with the new logNormal wrapper. Also added a section of the notebook to see how CV's dependence on [P] varies by deltaG. |
Three things:
|
@sonyahanson : I don't understand the description of the problems you list above, so come find me in meatspace if I can help. |
…intial guess starts at the true value. In addition, the gradient tolerance of the BFGS optimizer was also readuced for speed.
…tted to is now outputed. Notebook updated accordingly.
Thanks, Greg! All three points addressed with the last two commits. Currently running a notebook to test on a higher affinity dataset. |
A class and function have been written to estimate the coefficient of variation, variance, and bias of fluorescence experiments at different assay parameters, such as the protein and ligand concentrations.
This PR includes a Jupyter notebook to demonstrate the functioning of these tools on assaytools PyMC data on the p38-Bosutinib complex. This PR has been opened so we can discuss what other functionality we need from this, or if there are any glaring errors. Currently, the files are kept in a separate folder, but these can be moved to more appropriate locations once before this PR is merged.
To do