Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with troubleshooting qiime dada2 denoise-paired error. [NAs produced by integer overflow...is there a fix?] #2076

Open
aron1014 opened this issue Jan 15, 2025 · 4 comments

Comments

@aron1014
Copy link

I encountered the following error during the dada2 denoise-paired step after removing primers using cutadapt (see below for the command and error). The amplicon has an average length around 183 bp. I'm using qiime2-amplicon-2024.5 on a mac M3 max computer. I've run this exact code for many other libraries, all prepped exactly the same using the same assay, but have never encountered this error.

I'm currently trying to run it through the dada2 pipeline (not within qiime) to see if I still get the same error. I'll follow up once it finishes. In the meantime, I'm hoping someone here may have some insights. Thanks!

(qiime2-amplicon-2024.5) rdceradk@CERADK-MN-BB888 Trinity_Orchid_inverts_09Dec2024 % qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end-trimmed.qza
--p-trunc-len-f 125
--p-trunc-len-r 125
--p-trim-left-f 0
--p-trim-left-r 0
--p-n-threads 15
--p-n-reads-learn 1000000
--o-representative-sequences rep-seqs-dada.qza
--o-table table-dada.qza
--o-denoising-stats stats-dada.qza
--verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/forward --input_directory_reverse /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/reverse --output_path /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/output.tsv.biom --output_track /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/track.tsv --filtered_directory /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_f --filtered_directory_reverse /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_r --truncation_length 125 --truncation_length_reverse 125 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 15 --learn_min_reads 1000000

R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering The filter removed all reads: /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_f/4March2024-76_L001_R1_001.fastq.gz and /var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_r/4March2024-76_L001_R2_001.fastq.gz not written.
Some input samples had no reads pass the filter.
........................................................................................................................................................................................................................x............................................................
3) Learning Error Rates
312203250 total bases in 2497626 reads from 3 samples will be used for learning the error rates.
312203250 total bases in 2497626 reads from 3 samples will be used for learning the error rates.
3) Denoise samples ................................................................Error in dada(drpF, err = err, multithread = multithread, verbose = FALSE) :
NAs in derep$quals matrix. Check that all input sequences had valid associated qualities assigned.
In addition: Warning message:
In derepQuals[sqnms, ] + out$cum_quals[sqnms, ] :
NAs produced by integer overflow
2: stop("NAs in derep$quals matrix. Check that all input sequences had valid associated qualities assigned.")
1: dada(drpF, err = err, multithread = multithread, verbose = FALSE)
Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 350, in denoise_paired
run_commands([cmd])
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 37, in run_commands
subprocess.run(cmd, check=True)
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/forward', '--input_directory_reverse', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/reverse', '--output_path', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/output.tsv.biom', '--output_track', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/track.tsv', '--filtered_directory', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_f', '--filtered_directory_reverse', '/var/folders/tf/9kbmk08x4y19x3rhtkqd_cjm0000gr/T/tmputofb3vt/filt_r', '--truncation_length', '125', '--truncation_length_reverse', '125', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '2', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '15', '--learn_min_reads', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in denoise_paired
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/opt/miniconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_dada2/_denoise.py", line 363, in denoise_paired
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

@aron1014
Copy link
Author

I got the same error after running it through the dada2 pipeline. However, this pipeline shows me which sample was being processed when the error occurred, and it's the largest sample, by far (1.83GB for R2 alone). Is there some kind of file size limitation?

@benjjneb
Copy link
Owner

We are starting to run into some integer overflow issues with really large samples, which is what you are experiencing. We've fixed overflows in a couple places in this commit: d8bb758

Which version of dada2 in R were you using? If you weren't using the latest devel version (the latest version of the "master" branch on Github) could you give that a try and see if it can handle the problem sample?

@aron1014
Copy link
Author

packageVersion("dada2")
[1] ‘1.32.0’

I'm currently running it again, but this time I split that large sample half (now two separate files for R1 and R2) to see if that may fix the problem, but I will also try the latest version of the master branch on github with the original sample too! I'll post an update after.

Thanks for helping!

@aron1014
Copy link
Author

I upgraded dada2 to version 1.35.6 and still get the same error at the same sample (see below). I'm currently running the split sample in qiime. I'll let you know if that works.

make lists to hold the loop output

mergers <- vector("list", length(sample.names))
names(mergers) <- sample.names
ddF <- vector("list", length(sample.names))
names(ddF) <- sample.names
ddR <- vector("list", length(sample.names))
names(ddR) <- sample.names

For each sample, get a list of merged and denoised sequences

system.time(

  • for(sam in sample.names) {
  • cat("Processing:", sam, "\n")
  • Dereplicate forward reads

  • derepF <- derepFastq(filtFs[[sam]])
  • Infer sequences for forward reads

  • dadaF <- dada(derepF, err = errF_4, multithread = TRUE)
  • ddF[[sam]] <- dadaF
  • Dereplicate reverse reads

  • derepR <- derepFastq(filtRs[[sam]])
  • Infer sequences for reverse reads

  • dadaR <- dada(derepR, err = errR_4, multithread = TRUE)
  • ddR[[sam]] <- dadaR
  • Merge reads together

  • merger <- mergePairs(ddF[[sam]], derepF, ddR[[sam]], derepR)
  • mergers[[sam]] <- merger
  • }
  • )
    Processing: 24Sep2024-1_S151_L001
    Sample 1 - 366382 reads in 4370 unique sequences.
    Sample 1 - 366382 reads in 10246 unique sequences.
    Processing: 24Sep2024-10_S162_L001
    Sample 1 - 113214 reads in 1509 unique sequences.
    Sample 1 - 113214 reads in 6337 unique sequences.
    Processing: 24Sep2024-11_S163_L001
    Sample 1 - 1619632 reads in 13211 unique sequences.
    Sample 1 - 1619632 reads in 35897 unique sequences.
    Processing: 24Sep2024-12_S164_L001
    Sample 1 - 657193 reads in 9396 unique sequences.
    Sample 1 - 657193 reads in 25348 unique sequences.
    Processing: 24Sep2024-13_S165_L001
    Sample 1 - 2330536 reads in 16339 unique sequences.
    Sample 1 - 2330536 reads in 46466 unique sequences.
    Processing: 24Sep2024-14_S166_L001
    Sample 1 - 1795460 reads in 12057 unique sequences.
    Sample 1 - 1795460 reads in 39785 unique sequences.
    Processing: 24Sep2024-15_S167_L001
    Sample 1 - 2732068 reads in 20805 unique sequences.
    Sample 1 - 2732068 reads in 184711 unique sequences.
    Processing: 24Sep2024-16_S168_L001
    Sample 1 - 1769357 reads in 10371 unique sequences.
    Sample 1 - 1769357 reads in 22586 unique sequences.
    Processing: 24Sep2024-17_S169_L001
    Sample 1 - 1988100 reads in 10738 unique sequences.
    Sample 1 - 1988100 reads in 22943 unique sequences.
    Processing: 24Sep2024-18_S170_L001
    Sample 1 - 4037965 reads in 20439 unique sequences.
    Sample 1 - 4037965 reads in 34527 unique sequences.
    Processing: 24Sep2024-19_S171_L001
    Sample 1 - 644812 reads in 6211 unique sequences.
    Sample 1 - 644812 reads in 21463 unique sequences.
    Processing: 24Sep2024-2_S152_L001
    Sample 1 - 218087 reads in 1661 unique sequences.
    Sample 1 - 218087 reads in 6620 unique sequences.
    Processing: 24Sep2024-3_S153_L001
    Sample 1 - 964716 reads in 4905 unique sequences.
    Sample 1 - 964716 reads in 13669 unique sequences.
    Processing: 24Sep2024-4_S154_L001
    Sample 1 - 822410 reads in 4978 unique sequences.
    Sample 1 - 822410 reads in 6643 unique sequences.
    Processing: 24Sep2024-5_S155_L001
    Sample 1 - 1174988 reads in 7462 unique sequences.
    Sample 1 - 1174988 reads in 12570 unique sequences.
    Processing: 24Sep2024-7_S157_L001
    Sample 1 - 84173 reads in 817 unique sequences.
    Sample 1 - 84173 reads in 1470 unique sequences.
    Processing: 24Sep2024-8_S158_L001
    Sample 1 - 6677590 reads in 28894 unique sequences.
    Sample 1 - 6677590 reads in 35636 unique sequences.
    Processing: 24Sep2024-9_S161_L001
    Sample 1 - 4873479 reads in 20367 unique sequences.
    Sample 1 - 4873479 reads in 26701 unique sequences.
    Processing: 25Sep2024-21_S173_L001
    Sample 1 - 1613295 reads in 14976 unique sequences.
    Sample 1 - 1613295 reads in 56289 unique sequences.
    Processing: 25Sep2024-22_S174_L001
    Sample 1 - 3526705 reads in 25838 unique sequences.
    Sample 1 - 3526705 reads in 119324 unique sequences.
    Processing: 25Sep2024-23_S175_L001
    Sample 1 - 526461 reads in 4912 unique sequences.
    Sample 1 - 526461 reads in 20557 unique sequences.
    Processing: 25Sep2024-24_S176_L001
    Sample 1 - 3636422 reads in 37253 unique sequences.
    Sample 1 - 3636422 reads in 116059 unique sequences.
    Processing: 25Sep2024-25_S177_L001
    Sample 1 - 1389599 reads in 18602 unique sequences.
    Sample 1 - 1389599 reads in 70303 unique sequences.
    Processing: 25Sep2024-26_S178_L001
    Sample 1 - 5676147 reads in 35446 unique sequences.
    Sample 1 - 5676147 reads in 203440 unique sequences.
    Processing: 25Sep2024-27_S179_L001
    Sample 1 - 2308890 reads in 22212 unique sequences.
    Sample 1 - 2308890 reads in 97648 unique sequences.
    Processing: 25Sep2024-28_S180_L001
    Sample 1 - 3472811 reads in 48060 unique sequences.
    Sample 1 - 3472811 reads in 102806 unique sequences.
    Processing: 25Sep2024-29_S181_L001
    Sample 1 - 4290171 reads in 35070 unique sequences.
    Sample 1 - 4290171 reads in 226861 unique sequences.
    Processing: 25Sep2024-30_S182_L001
    Sample 1 - 4300817 reads in 29595 unique sequences.
    Sample 1 - 4300817 reads in 134245 unique sequences.
    Processing: 25Sep2024-31_S183_L001
    Sample 1 - 6035793 reads in 44455 unique sequences.
    Sample 1 - 6035793 reads in 169770 unique sequences.
    Processing: 25Sep2024-32_S184_L001
    Sample 1 - 986372 reads in 8498 unique sequences.
    Sample 1 - 986372 reads in 73858 unique sequences.
    Processing: 25Sep2024-33_S185_L001
    Sample 1 - 1670336 reads in 11777 unique sequences.
    Sample 1 - 1670336 reads in 59244 unique sequences.
    Processing: 25Sep2024-34_S186_L001
    Sample 1 - 1361203 reads in 12123 unique sequences.
    Sample 1 - 1361203 reads in 60965 unique sequences.
    Processing: 25Sep2024-35_S187_L001
    Sample 1 - 736775 reads in 8878 unique sequences.
    Sample 1 - 736775 reads in 32627 unique sequences.
    Processing: 25Sep2024-36_S188_L001
    Sample 1 - 969531 reads in 9774 unique sequences.
    Sample 1 - 969531 reads in 26768 unique sequences.
    Processing: 25Sep2024-37_S189_L001
    Sample 1 - 258547 reads in 3551 unique sequences.
    Sample 1 - 258547 reads in 20567 unique sequences.
    Processing: 25Sep2024-38_S190_L001
    Sample 1 - 617413 reads in 7546 unique sequences.
    Sample 1 - 617413 reads in 25386 unique sequences.
    Processing: 26Sep2024-41_S195_L001
    Sample 1 - 83478 reads in 2201 unique sequences.
    Sample 1 - 83478 reads in 6557 unique sequences.
    Processing: 26Sep2024-42_S196_L001
    Sample 1 - 29 reads in 14 unique sequences.
    Sample 1 - 29 reads in 17 unique sequences.
    Processing: 26Sep2024-43_S197_L001
    Sample 1 - 57879 reads in 831 unique sequences.
    Sample 1 - 57879 reads in 1131 unique sequences.
    Processing: 26Sep2024-44_S198_L001
    Sample 1 - 1084546 reads in 14627 unique sequences.
    Sample 1 - 1084546 reads in 47952 unique sequences.
    Processing: 26Sep2024-45_S199_L001
    Sample 1 - 875567 reads in 4280 unique sequences.
    Sample 1 - 875567 reads in 7629 unique sequences.
    Processing: 26Sep2024-46_S200_L001
    Sample 1 - 312290 reads in 4903 unique sequences.
    Sample 1 - 312290 reads in 14202 unique sequences.
    Processing: 26Sep2024-47_S201_L001
    Sample 1 - 284183 reads in 4291 unique sequences.
    Sample 1 - 284183 reads in 26999 unique sequences.
    Processing: 26Sep2024-48_S202_L001
    Sample 1 - 943718 reads in 4592 unique sequences.
    Sample 1 - 943718 reads in 8429 unique sequences.
    Processing: 26Sep2024-49_S203_L001
    Sample 1 - 172854 reads in 2668 unique sequences.
    Sample 1 - 172854 reads in 8299 unique sequences.
    Processing: 26Sep2024-50_S204_L001
    Sample 1 - 1925841 reads in 14508 unique sequences.
    Sample 1 - 1925841 reads in 57048 unique sequences.
    Processing: 26Sep2024-51_S205_L001
    Sample 1 - 168307 reads in 1507 unique sequences.
    Sample 1 - 168307 reads in 3910 unique sequences.
    Processing: 26Sep2024-52_S206_L001
    Sample 1 - 152820 reads in 2280 unique sequences.
    Sample 1 - 152820 reads in 5338 unique sequences.
    Processing: 26Sep2024-53_S207_L001
    Sample 1 - 341771 reads in 3926 unique sequences.
    Sample 1 - 341771 reads in 18730 unique sequences.
    Processing: 26Sep2024-54_S208_L001
    Sample 1 - 254074 reads in 2375 unique sequences.
    Sample 1 - 254074 reads in 4527 unique sequences.
    Processing: 26Sep2024-55_S209_L001
    Sample 1 - 81565 reads in 1275 unique sequences.
    Sample 1 - 81565 reads in 2529 unique sequences.
    Processing: 26Sep2024-56_S210_L001
    Sample 1 - 710867 reads in 4758 unique sequences.
    Sample 1 - 710867 reads in 11882 unique sequences.
    Processing: 27Sep2024-59_S213_L001
    Sample 1 - 94947 reads in 1735 unique sequences.
    Sample 1 - 94947 reads in 7066 unique sequences.
    Processing: 27Sep2024-60_S214_L001
    Sample 1 - 548774 reads in 8035 unique sequences.
    Sample 1 - 548774 reads in 27329 unique sequences.
    Processing: 27Sep2024-61_S215_L001
    Sample 1 - 8622449 reads in 31655 unique sequences.
    Sample 1 - 8622449 reads in 57734 unique sequences.
    Processing: 27Sep2024-62_S216_L001
    Sample 1 - 296973 reads in 3534 unique sequences.
    Sample 1 - 296973 reads in 34875 unique sequences.
    Processing: 27Sep2024-63_S217_L001
    Sample 1 - 54355371 reads in 138137 unique sequences.
    Sample 1 - 54355371 reads in 184967 unique sequences.
    Processing: 27Sep2024-64_S218_L001
    Sample 1 - 391995 reads in 5489 unique sequences.
    Sample 1 - 391995 reads in 25171 unique sequences.
    Processing: 27Sep2024-66_S220_L001
    Sample 1 - 235 reads in 23 unique sequences.
    Sample 1 - 235 reads in 60 unique sequences.
    Processing: 27Sep2024-67_S221_L001
    Sample 1 - 547762 reads in 10819 unique sequences.
    Sample 1 - 547762 reads in 22582 unique sequences.
    Processing: 27Sep2024-68_S222_L001
    Sample 1 - 525142 reads in 7930 unique sequences.
    Sample 1 - 525142 reads in 23807 unique sequences.
    Processing: 27Sep2024-69_S225_L001
    Sample 1 - 1043497 reads in 10333 unique sequences.
    Sample 1 - 1043497 reads in 30772 unique sequences.
    Processing: 27Sep2024-70_S226_L001
    Sample 1 - 1731214 reads in 9240 unique sequences.
    Sample 1 - 1731214 reads in 24491 unique sequences.
    Processing: 27Sep2024-71_S227_L001
    Error in dada(derepF, err = errF_4, multithread = TRUE) :
    NAs in derep$quals matrix. Check that all input sequences had valid associated qualities assigned.
    In addition: Warning message:
    In derepQuals[sqnms, ] + out$cum_quals[sqnms, ] :
    NAs produced by integer overflow
    Timing stopped at: 1689 67.38 1300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants