Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running C3Poa.py with 10x R2C2 nanopore data #15

Open
Bodako opened this issue Nov 25, 2020 · 11 comments
Open

Error when running C3Poa.py with 10x R2C2 nanopore data #15

Bodako opened this issue Nov 25, 2020 · 11 comments

Comments

@Bodako
Copy link

Bodako commented Nov 25, 2020

Dear @rvolden,

Hello,
I am using C3POa.py to preprocess 10x based-Nanopore R2C2 sequencing data.

python3.7 C3POa/C3POa.py -r data/R2C2_q7_pass_merged.fastq -o c3poa_output -s data/10x_UMI_splint.fasta -c data/config -l 1000 -d 500 -n 10 -g 1000

When I did the first run, I thought it was going well, but I got the following error.

Reading existing psl file
 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                 | 4404/4934 [50:17<06:03,  1.46it/s]
cat: c3poa_output/Splint1/tmp*/R2C2_Consensus.fasta: No such file or directory
cat: c3poa_output/Splint1/tmp*/subreads.fastq: No such file or directory

Also, I obtained another error when I tried to do the same thing after erasing the c3poa_output/results.

Aligning splints to reads with blat
Loaded 200 letters in 1 sequences
Searched 40289422513 bases in 4933273 sequences
  0%|                                                                                                                                                                            | 0/4934 [6:49:10<?, ?it/s]
cat: c3poa_output/Splint1/tmp*/R2C2_Consensus.fasta: No such file or directory
cat: c3poa_output/Splint1/tmp*/subreads.fastq: No such file or directory

and then,

Aligning splints to reads with blat
Traceback (most recent call last):
  File "C3POa/C3POa.py", line 227, in <module>
    main(args)
  File "C3POa/C3POa.py", line 182, in main
    adapter_dict, adapter_set, no_splint = preprocess(blat, args.out_path, tmp_dir, read_list, args.splint_file, tmp_adapter_dict)
  File "/appl/applications/cellphonedb/cpdb-venv/bin/C3POa/bin/preprocess.py", line 24, in preprocess
    with open(align_psl) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'c3poa_output/tmp/splint_to_read_alignments.psl'

Could you comment on what would be the cause of these issues and what I have to do to resolve it?

---input information---
config:

# Order doesn't matter
# If you use the config file, you should provide paths to all of the programs
# You need to include all of the example programs
# Use tabs to separate the program name from the path
racon   /appl/racon/racon/build/bin/racon
blat    /appl/blat/blatSrc/bin/blat

10x_UMI_splint.fasta:

>Splint1
TGAGGCTGATGAGTTCCATANNNNNTATATNNNNNATCAC
TACTTAGTTTTTTGATAGCTTCAAGCCAGAGTTGTCTTTT
TCTCTTTGCTGGCAGTAAAAGTATTGTGTACCTTTTGCTG
GGTCAGGTTGTTCTTTAGGAGGAGTAAAAGGATCAAATGC
ACTAANNNNNTATATNNNNNGCGATCGAAAATATCCCTTT

Thank you for your support.

@rvolden
Copy link
Owner

rvolden commented Nov 25, 2020

That's weird that on your first run, it didn't align with blat but still ran? So I'm guessing that wasn't your first run and it had already produced the psl file with the splint alignments. When you're removing files from a previous run, assuming the blat step was run correctly, keep the tmp directory that contains splint_to_read_alignments.psl. It looks like the psl file might be messed up based on your other errors though, so try removing your entire output directory with rm -rf c3poa_output and then rerun. Sorry for wishy washy answers, but I haven't been able to reproduce these errors

@Bodako
Copy link
Author

Bodako commented Nov 26, 2020

Dear @rvolden,

Thank you for your kind answer!
I meant I did three separate runs. Sorry for confusing you.

I tried to rerun the following codes after removing the rm -rf c3poa_output and making a new directory.

python3.7 C3POa/C3POa.py -r data/R2C2_q7_pass_merged.fastq -o output/PM-PS-0001-T_R2C2/ -s data/10x_UMI_splint.fasta -c data/config -l 1000 -d 500 -n 10 -g 1000

However, I obtained the same error.

Aligning splints to reads with blat
Traceback (most recent call last):
  File "C3POa/C3POa.py", line 227, in <module>
    main(args)
  File "C3POa/C3POa.py", line 182, in main
    adapter_dict, adapter_set, no_splint = preprocess(blat, args.out_path, tmp_dir, read_list, args.splint_file, tmp_adapter_dict)
  File "/appl/applications/cellphonedb/cpdb-venv/bin/C3POa/bin/preprocess.py", line 24, in preprocess
    with open(align_psl) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'output/R2C2/c3poa/tmp/splint_to_read_alignments.psl'

It generates output files c3poa.log and tmp, but they are empty.

@rvolden
Copy link
Owner

rvolden commented Nov 26, 2020

Weird how it goes straight from the Aligning message to the traceback. Something that doesn't make sense is that the directory it's saying for the splint isn't right. It's saying it's 'output/R2C2/c3poa/tmp/splint_to_read_alignments.psl', but the tmp directory should be output/PM-PS-0001-T_R2C2/tmp/. There should at least be some sort of message from blat. Can you do blat -noHead -stepSize=1 -t=DNA -q=DNA -minScore=15 -minIdentity=10 data/10x_UMI_splint.fasta data/R2C2_q7_pass_merged.fastq output/PM-PS-0001-T_R2C2/tmp/splint_to_read_alignments.psl? If I had to guess it might have something to do with your environment?

@Bodako
Copy link
Author

Bodako commented Nov 26, 2020

Dear @rvolden,

Thank you for your kind support!

I followed your advice.

/appl/blat/blatSrc/bin/blat -noHead -stepSize=1 -t=DNA -q=DNA -minScore=15 -minIdentity=10 data/10x_UMI_splint.fasta data/R2C2_q7_pass_merged.fastq data/output/PM-PS-0001-T_R2C2/c3poa/tmp/splint_to_read_alignments.psl

And I obtained followed comment.

needLargeMem: trying to allocate 83388293410 bytes (limit: 17179869184)

Is that a memory problem for my environment?

@rvolden
Copy link
Owner

rvolden commented Nov 26, 2020

Yeah that's weird - I haven't seen that error before. Looks like blat is limited to 17Gb of memory usage. I'm not sure how I haven't seen that before. I'm working on a workaround, I'll ping this issue when I push the changes.

@rvolden
Copy link
Owner

rvolden commented Nov 26, 2020

Just pushed a preprocessing version where it chunks the input for parallelized blat. This should hopefully get around your input file size and it'll speed up the preprocessing considerably.

@Bodako
Copy link
Author

Bodako commented Nov 26, 2020

Dear @rvolden,

Hello,
Thank you for your kind support!!

I updated new C3POa.py and did run with the test fastq (includes 100 reads) and reduced thread -n 2.
It passed the first step, but I got the same error as the first run.

Aligning splints to reads with blat
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  3.66it/s]
  0%|                                                                                                                                                                                 | 0/1 [00:00<?, ?it/s]
cat: data/test/test_output/Splint1/tmp*/R2C2_Consensus.fasta: No such file or directory
cat: data/test/test_output/Splint1/tmp*/subreads.fastq: No such file or directory

It generates following output files and psl.

test_run/
├─c3poa.log
├── tmp
│   └── splint_to_read_alignments.psl
└── Splint_1
     ├── R2C2_Consensus.fasta
     └── R2C2_Subreads.fastq
splint_to_read_alignments.psl
16      0       0       0       0       0       0       0       +       c9a51404-76b6-411b-b788-54855baf200f    1029    997     1013    Splint1 200     0       16      1       16,     997,    0,
136     2       0       10      3       6       5       13      -       c7dcaa9e-41cb-4604-84c8-38c34daf1eb1    1013    248     402     Splint1 200     0       161     7       62,21,20,11,14,8,12,    611,
673,695,716,727,741,753,        0,64,88,108,121,136,149,
173     0       0       20      0       0       3       6       -       1ee48d77-a2e1-4c19-add0-fb2373182e68    1124    736     929     Splint1 200     0       199     4       51,8,22,112,    195,246,254,
276,    0,54,64,87,

@rvolden
Copy link
Owner

rvolden commented Nov 26, 2020

Remove the output and try it again with -n 10. I tested with lower thread numbers and it causes multiprocessing to throw an error.

@Bodako
Copy link
Author

Bodako commented Nov 27, 2020

Dear @rvolden,

Thank you for your continuous feedback.
I removed the output and ran again with -n 10.

It is progressing than before, but still not.

In the real dataset,

Aligning splints to reads with blat
  0%|                                                                                                                                                                                | 0/10 [02:37<?, ?it/s]
cat: output/c3poa/pre_tmp_*/tmp_splint_aln.psl: No such file or directory
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4934/4934 [02:02<00:00, 40.23it/s]
output/
├─c3poa.log
└── tmp
    └── splint_to_read_alignments.psl

In testset,

Aligning splints to reads with blat
 30%|██████████████████████████████████████████████████▍                                                                                                                     | 3/10 [00:00<00:01,  5.12it/s]
  0%|                                                                                                                                                                                 | 0/1 [00:00<?, ?it/s]
cat: test/test_output/Splint1/tmp*/R2C2_Consensus.fasta: No such file or directory
cat: test/test_output/Splint1/tmp*/subreads.fastq: No such file or directory
test_run/
├─c3poa.log
└── tmp
    └── splint_to_read_alignments.psl

Is the difference between the two dataset due to memory?
blat seems to use fasta as an input, but is it because my input is fastq? blat seems to recognize fastq as a list of files.

usage:
   blat database query [-ooc=11.ooc] output.psl
where:
   database and query are each either a .fa , .nib or .2bit file,
   or a list these files one file name per line.

@rvolden
Copy link
Owner

rvolden commented Nov 30, 2020

C3POa will chunk the fastq and write temporary fastas for the blat alignment. When you use -n 10, it should split your input fastq into 10 chunks, which should take care of the memory error you were getting since the input files will be 10% of what they were before. As we can see, there's still something going wrong with your alignment, which is why it can't cat the psl files after all of the chunk alignments are done. Try subsampling your fastq and converting to fasta and aligning manually with a blat command similar to what I posted before.

@rvolden rvolden reopened this Nov 30, 2020
@yangao07
Copy link

I also met the same error recently.
Then I found that it was related to the conk.
Maybe you can try to re-install it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants