Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about the C3POa_preprocessing.py and C3POa_postprocessing.py results ? #7

Closed
wlhCNU opened this issue Apr 1, 2019 · 3 comments

Comments

@wlhCNU
Copy link

wlhCNU commented Apr 1, 2019

Hello,
I have two questions to ask you :

  1. C3POa_preprocessing.py analysis will produce two types of results: R2C2_raw_reads.fastq and No_splint_reads.fastq. Were the No_splint_reads.fastq could aligned to the appropriate genomes directly ? Does the No_splint_reads.fastq have incomplete sequence of splint fasta or cDNA adapter sequences ?

  2. C3POa_postprocessing.py analysis will produce one result file: R2C2_full_length_consensus_reads_R2.fasta. Was the meaning of the last numbers of reads name(as shown in the sequence below:377)? the true length of below sequence is subtracted by 377, and the final result is 80. How should we understand the number 80 ?

82d4e633-f330-464b-8c04-bcc093f37174_15.52_2488_2_735_377
GGCGACCAATGAGATCTTACACCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGAATTTGGTGGGAGCTTTGCTGAACTCCTCTACAGGTTCCGATTGTCTGAAGATGCCCGTCCGGGCTTTGCTAAGACTGGCTCCTGTGCTGCTTGGGAACCCACAGCCGATGGTCATGTGACCACCTCTGCCGAGGGAAACCCACCCTAGAAATGACTGCGGCACAGAACCCTGATGGCAGCAAAGTGAAGGCAGCATTTCTGTTTTGTAACCAATGGAACTCAGTTGCATACGCCTCACTGTGCTTAAAATTCATGTTGAAAATAAGACAGATAACGCTGGTGTTGTCCACGTGTCATATGATGTATAAAACATCAGTTAAAACTCACATTTTGTAACAAAGATTTTGTTTGTTTTCAAAAAAAAAAAAAAAAACATTTGCGTTGATACCACTGCTTAAAG

@rvolden
Copy link
Owner

rvolden commented Apr 2, 2019

  1. The No_splint_reads.fastq is just a collection of reads that BLAT couldn't find a splint sequence in. It should theoretically have both splint and smartseq adapters in it, but it could just be that there were too many errors for it to be found by BLAT.

  2. The last number in the name after postprocessing should be the length of the sequence after trimming the adapters. We leave the adapters in the final product because if we put UMIs in, we can use them to demux our reads. 80 is just the total length of the 3' and 5' adapters.

@wlhCNU
Copy link
Author

wlhCNU commented Apr 20, 2019

Hi,
When I get the postprocessing sequences,what I need to de next is to remove the 3' and 5' adapeters and the ployA tail. Do you have a good implementation or software to do this ? Thanks.

@rvolden
Copy link
Owner

rvolden commented Apr 24, 2019

I've pushed a change that allows for trimming of adapters in postprocessing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants