Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very few subreads and consensus reads despite many reads after preprocessing #20

Open
kvg opened this issue Jun 2, 2021 · 1 comment

Comments

@kvg
Copy link

kvg commented Jun 2, 2021

Hello,
I'm testing out C3POa v2.2.3 on a small test dataset (176,000 reads from a much larger PromethION run). I'm hoping to use C3POa's demultiplexing feature and I've prepared a splints file with four sequences. Initial processing looks good at first:

$ # python3 C3POa.py -r /data/chunk.fastq -s /data/splints.fasta -l 100 -d 500 -g 1000 -o out
Aligning splints to reads with blat
Preprocessing:  99%|█████████████████████████████████████████████████████████████████████████▌| 176/177 [02:09<00:00,  1.36it/s]
Catting psls: 100%|██████████████████████████████████████████████████████████████████████████| 176/176 [00:01<00:00, 129.95it/s]
Removing preprocessing files: 100%|█████████████████████████████████████████████████████████| 176/176 [00:00<00:00, 2590.99it/s]
Calling consensi:   0%|                                                                                 | 0/177 [02:25<?, ?it/s]
Catting consensus reads: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11949.58it/s]
Catting subreads: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 7898.00it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 4450.05it/s]
Catting consensus reads: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9177.91it/s]
Catting subreads: 100%|███████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 8104.33it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 4262.17it/s]
Catting consensus reads: 100%|███████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 13879.81it/s]
Catting subreads: 100%|██████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 11671.71it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 4974.09it/s]
Catting consensus reads: 100%|████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 7833.72it/s]
Catting subreads: 100%|██████████████████████████████████████████████████████████████████████| 82/82 [00:00<00:00, 10553.00it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 82/82 [00:00<00:00, 4555.04it/s]
(lr-c3poa) root@f8924132b3ed:/#

$ cat out/c3poa.log
C3POa version: v2.2.3
Total reads: 176000
No splint reads: 37306 (21.20%)
Under len cutoff: 0 (0.00%)
Total thrown away reads: 37306 (21.20%)
Reads after preprocessing: 138694

However, in checking the output subread and consensus files, I see very few entries:

# grep -c '^[>@]' out/10x_Splint_*/*
out/10x_Splint_1/R2C2_Consensus.fasta:4
out/10x_Splint_1/R2C2_Subreads.fastq:96
out/10x_Splint_2/R2C2_Consensus.fasta:1
out/10x_Splint_2/R2C2_Subreads.fastq:60
out/10x_Splint_3/R2C2_Consensus.fasta:19
out/10x_Splint_3/R2C2_Subreads.fastq:282
out/10x_Splint_4/R2C2_Consensus.fasta:13
out/10x_Splint_4/R2C2_Subreads.fastq:325

These seem like awfully low numbers to me, but it's not clear to me where they're getting lost. Shouldn't the total number of subreads add up to reads after preprocessing? And assuming 5-10 passes per subreads, shouldn't the number of consensus reads be somewhere between 14k - 30k reads? Is there a way to know what's happening to the rest of the reads? Or is my understanding simply incorrect?

Thanks,
-Kiran

@Usamahussein551980
Copy link

Hello Kiran,

I'm trying to analyze the long read data generated from the ONT sequencer according to the C3POa work flow, but the pre-processing doesn't continue and finished at Calling consensi.
Despite the tools were installed with their dependencies, and I prepared the UMI_Splint.fasta used in the experiment, but unfortunately the process stopped as showed below:

command:
(base) [ukhussein@ldragon3 C3POa-2.2.3]$ python3 C3POa.py -r ../../projects/nanopore_R2C2/10X_071_R2C2/test/dngqu0264_71_fastq_pass.tar.gz -s ./UMI_Splint.fasta/UMI_Splints.fasta -d 500 -l 100 -g 1000 -n 32 -o out2

abpoa

abpoa

Output:
pr-processing
pr-processing

Log Contents:
$ cat/out/c3poa.log
C3POa version: v2.2.3
Total reads: 1687451
No splint reads: 1505291 (89.21%)
Under len cutoff: 15 (0.00%)
Total thrown away reads: 1505306 (89.21%)
Reads after preprocessing: 182145

Could you please help me to figure out what is the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants