Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in optim(par = guess, fn = cost.dnarna, llfnDNA = ll.funs$dna, llfnRNA = ll.funs$rna, : non-finite value supplied by optim #11

Open
manueltar opened this issue Jul 17, 2021 · 5 comments

Comments

@manueltar
Copy link

Hi,

First of all, thank you very much for having done this software, the documentation and the simulated examples.

I am trying to run MPRAnalyze for my MPRA data.

I am running the following version:

$ Package : chr "MPRAnalyze"
$ Type : chr "Package"
$ Title : chr "Statistical Analysis of MPRA data"
$ Version : chr "1.10.0"

I am able to create the object of formal class MpraObject. In my case my data is pre-normalised so my input dna and rna depth is 1:

obj <- MpraObject(dnaCounts = matrix_gDNA_NOR_QC_PASS_ordered, 
                       rnaCounts = matrix_cDNA_NOR_QC_PASS_ordered, 
                       colAnnot = annot.dt,
                       controls = as.integer(indexes_CTRLS))      
  
     
obj <- setDepthFactors(obj, dnaDepth = rep(1, NCOL(matrix_gDNA_NOR_QC_PASS_ordered)),
                            rnaDepth = rep(1, NCOL(matrix_cDNA_NOR_QC_PASS_ordered)))

After googling this error I found a solution based on converting the annot data frame batch and barcode columns into factors. This is not my problem as my annot columns are factors:

annot.dt
'data.frame': 210 obs. of 3 variables:
$ barcode : Ord.factor w/ 15 levels "bc1"<"bc2"<"bc3"<..: 1 2 3 4 5 6 7 8 9 10 ...
$ batch : Ord.factor w/ 7 levels "b1"<"b2"<"b3"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ condition: Ord.factor w/ 2 levels "REF"<"ALT": 1 1 1 1 1 1 1 1 1 1 ...

Moreover, after the 'mock' normalisation step I retrieve the MpraObject with the adecuate dna/rna depth as well as the annot factors:

obj_2


> Formal class 'MpraObject' [package "MPRAnalyze"] with 18 slots
>   ..@ dnaCounts            : num [1:723, 1:210] 1303 1536 0 293 867 ...
>   .. ..- attr(*, "dimnames")=List of 2
>   .. .. ..$ : chr [1:723] "enh_1" "enh_2" "enh_3" "enh_4" ...
>   .. .. ..$ : chr [1:210] "REF_b1_bc1" "REF_b1_bc2" "REF_b1_bc3" "REF_b1_bc4" ...
>   ..@ rnaCounts            : num [1:723, 1:210] 1390 1373 14 265 699 ...
>   .. ..- attr(*, "dimnames")=List of 2
>   .. .. ..$ : chr [1:723] "enh_1" "enh_2" "enh_3" "enh_4" ...
>   .. .. ..$ : chr [1:210] "REF_b1_bc1" "REF_b1_bc2" "REF_b1_bc3" "REF_b1_bc4" ...
>   ..@ dnaAnnot             :'data.frame':       210 obs. of  3 variables:
>   .. ..$ barcode  : Ord.factor w/ 15 levels "bc1"<"bc2"<"bc3"<..: 1 2 3 4 5 6 7 8 9 10 ...
>   .. ..$ batch    : Ord.factor w/ 7 levels "b1"<"b2"<"b3"<..: 1 1 1 1 1 1 1 1 1 1 ...
>   .. ..$ condition: Ord.factor w/ 2 levels "REF"<"ALT": 1 1 1 1 1 1 1 1 1 1 ...
>   ..@ rnaAnnot             :'data.frame':       210 obs. of  3 variables:
>   .. ..$ barcode  : Ord.factor w/ 15 levels "bc1"<"bc2"<"bc3"<..: 1 2 3 4 5 6 7 8 9 10 ...
>   .. ..$ batch    : Ord.factor w/ 7 levels "b1"<"b2"<"b3"<..: 1 1 1 1 1 1 1 1 1 1 ...
>   .. ..$ condition: Ord.factor w/ 2 levels "REF"<"ALT": 1 1 1 1 1 1 1 1 1 1 ...
>   ..@ rowAnnot             : NULL
>   ..@ controls             : logi [1:723] FALSE FALSE FALSE FALSE FALSE FALSE ...
>   ..@ controls.forfit      : NULL
>   ..@ lib.factor           :Formal class 'factor' [package "methods"] with 3 slots
>   .. .. ..@ .Data   : int(0)
>   .. .. ..@ levels  : chr(0)
>   .. .. ..@ .S3Class: chr "factor"
>   ..@ dnaDepth             : num [1:210] 1 1 1 1 1 1 1 1 1 1 ...
>   ..@ rnaDepth             : num [1:210] 1 1 1 1 1 1 1 1 1 1 ...

After all these checks the error persists so I would like to ask you for your help solving this. I am happy to provide all the files involved. Best regards,

@manueltar
Copy link
Author

Sorry, I forgot to mention that the step that crashes is:

obj <- analyzeQuantification(obj, dnaDesign = ~ batch + barcode, 
                                  rnaDesign = ~1)

@talashuach
Copy link
Collaborator

@manueltar
Can you explain what you mean by "pre-normalized"? The MPRAnalyze model uses a negative binomial likelihood, so it expects the input data to be unnormalized counts.
Other than that the design and usage look correct, if you can provide the files (here or via email), I can troubleshoot and figure out what's going on.

Thanks!
Tal

@manueltar
Copy link
Author

Hi Tal,

Thank you for your reply. I mean that my data is already normalised, in fact I have data from different cell lines for the same enhancers and I normalised them together using MPRAnalyze "uq". I apply the dnadepth and rnadepth factors to divide my counts and check a series of metrics of the MPRA on the normalised data in all my cell lines (LogFC, ASE, normalised dna counts and rna counts etc). I do this for QC purposes.

Then I am interested in running MPRAnalyze separately in each of the cell lines as they will likely have some hits that are exclusive. So I subset my dna and rna count matrixes and I create the annot df from the subset matrixes. For the normalisation step I use the setDepthFactors as explained in the vignette. Because my data is already normalised I set the factors to 1 for the dna and rna counts. This step works but in the analyzeQuantification it thorws the error

@talashuach
Copy link
Collaborator

Hi Manuel,

thank you for the explanation and for sending over some data.

The issue is indeed the pre-normalization.
The MPRAnalyze model expects the data to be integer counts, while your input data (after you normalize) are numeric but not necessarily integers. MPRAnalyze uses a negative binomial likelihood, which expects integers, and crashes when used on non-integer numbers.

To fix this, I'd suggest one of two things:
1 - take the normalization factors you used in the "pre-normalization" stage and set those to be the depth factors. Internally, this will use them to "normalize" the counts by including these factors in the model, without changing the type of input.
2 - Run the entire analysis together. If you're interested in condition-specific activation, you can include 'condition' in your rna design, which will make the model estimate the activity separately within each condition.

Please let me know if these aren't clear or if it doesn't work.

@manueltar
Copy link
Author

Dear Tal,

Thank you very much for your help in this. As you have found out the problem was in passing the normalized data which are not integers.

I have done both my approximation and your suggestion:

  1. My approach: Normalization of all the data together and then running the analyzeQuantification + testEmpirical( useControls = TRUE) for the enhancer activity within each cell type data set. After this I run the analyzeComparative(dnaDesign = ~ batch + barcode + condition, rnaDesign = ~ condition, reducedDesign = ~ 1), again within all the replicas of the same cell type the condition being the REF or ALT barcodes for every enhancer.
  2. Your suggestion: with the complete dataset (4 different cell types): analyzeComparative (dnaDesign = ~ batch + barcode + condition, rnaDesign = ~ condition, reducedDesign = ~ 1) for the enhancer activity with testLrt. In this case the condition is the cell type. Then I run the analyzeComparative(dnaDesign = ~ batch + barcode + condition, rnaDesign = ~ condition, reducedDesign = ~ 1) for the ASE activity the condition this time being a string Cell_Type and REF or ALT labels.

The results are different between both approaches; the second one yields more significant hits. But I have questions on how to interpret them:
• First of all, I have a set of controls that I use in the analyzeQuantification that give rise to the p.val empirical column. That option is not available for the testLrt method so I assume I cannot use these controls in your approach. Am I right?
• Second, as I said I have different cell types and I get a pval that doesn’t indicate in which condition(s) the enhancer is active. I understand that the way to do this is to represent the LogFC and the ASE metrics per cell type and then decide in which condition (here cell type) it is active?
As I said, thank you very much for your time and patience and for the support in using this tool. Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants