Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(hom_rate > 0.4) {: missing value where TRUE/FALSE needed #200

Open
stela2502 opened this issue Aug 6, 2024 · 3 comments
Open

(hom_rate > 0.4) {: missing value where TRUE/FALSE needed #200

stela2502 opened this issue Aug 6, 2024 · 3 comments
Assignees
Labels

Comments

@stela2502
Copy link

Hi Numbat developers,

I am not used to run SNP analyses and am trying to apply your tool to our 10x data.
I have created the allele counts tar.gz file and am now trying to run the numbat_run step.
I have had other issues previousely and am therefore kind of sure that this error or something totally new:

Error in if (hom_rate > 0.4) {: missing value where TRUE/FALSE needed
Traceback:

  1. runBatch(i)
  2. numbat::run_numbat(subset, ref_hca, fread(df_allele_ATC2), genome = hg38,
    . t = 1e-05, ncores = 4, min_cells = 100, plot = TRUE, out_dir = paste(sep = ,
    . ./numbat_run_, sampleid, /, batch)) # at line 28-38 of file
  3. bulk_subtrees %>% filter(sample == 0) %>% check_contam()
  4. check_contam(.)

This error is thrown by this function and I fear that my data is simply lacking,

#' check inter-individual contamination
#' @param bulk dataframe Pseudobulk profile
#' @return NULL
#' @Keywords internal
check_contam = function(bulk) {

hom_rate = bulk %>% filter(DP >= 8) %>%
    {mean(na.omit(.$AR == 0 | .$AR == 1))}

if (hom_rate > 0.4) {
    msg = paste0(
        'High SNP contamination detected ',
        '(', round(hom_rate*100, 1), '%)',
        '. Please make sure that cells from only one individual are included in genotyping step.')
    message(msg)
    log_warn(msg)
}

}

Can you tell me how I can filter my data to get a working sample into your program?

Thank you very much!

/Stefan

@frstyang
Copy link

frstyang commented Jan 3, 2025

Hi, I'm facing the same issue. Were you able to resolve this problem, and if so, would you mind explaining a bit how?

Based on the location in the code where hom_rate is computed, the error seems to be because there aren't enough pseudobulks which pass the filtering in check_contam.

@teng-gao teng-gao reopened this Jan 6, 2025
@teng-gao teng-gao self-assigned this Jan 6, 2025
@teng-gao teng-gao added the tofix label Jan 6, 2025
@teng-gao
Copy link
Collaborator

teng-gao commented Jan 6, 2025

Seems like if the bulk df is empty after filtering by DP it will throw this error. Need to catch this case & throw a more informative error msg

Same as #210

In the mean time this probably means the dataset doesn't have enough SNP coverage..

@frstyang
Copy link

frstyang commented Jan 8, 2025

Thanks for the response! It turns out, I was running into this error because of a bug in my nextflow pipeline where the allelic data and expression data being fed into numbat were mismatched from different samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants