Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of repeats used for consensus #18

Open
OscarT32 opened this issue Mar 31, 2021 · 9 comments
Open

Number of repeats used for consensus #18

OscarT32 opened this issue Mar 31, 2021 · 9 comments

Comments

@OscarT32
Copy link

I have generated consensus sequences for different datasets using C3POa. I am trying to do some stats by stablishing a correlation between the number of subreads and the accuracy of the consensus. When I am splitting the output file based on the information present in the header of each consensus sequence in the C3POa output, I have noticed that there is a jump from "1" to "3" without any sequences with "2" in all my output files. I have checked my input file and I have data that should fall into the "2" category. I am not sure why this is happening or If I am misunderstanding the output file. Thanks! for your assistance.

@rvolden
Copy link
Owner

rvolden commented Mar 31, 2021

What version of C3POa are you using? If you're running something older, I suggest updating to the latest version (v2.2.2). I haven't seen this come up in my test dataset. This is what I see when I plot out the accuracy per coverage bin:
2repswarm

I think what's probably happening is there's a bug in the consensus script that's used for pairwise consensus calling. As far as I know, there shouldn't be any problems with it in the most updated version. If you're on the latest C3POa version and you're still not seeing any reads with a coverage of 2, add .get() to the apply_async call on line 247. This will disable threading for the consensus calling and it will actually show you the errors.

@OscarT32
Copy link
Author

OscarT32 commented Apr 1, 2021

Thanks for your answer.
I am not using the latest version of C3POa. I was trying to install the latest version but it seems that I have an issue installing "pyabpoa". When I use any of the two commands that you indicate to install the different packages I get the following warning (I am sorry if its something simple, I am fairly new to this. I am using Ubuntu 18.04):

pyabpoa

@rvolden
Copy link
Owner

rvolden commented Apr 1, 2021

Do you have Cython installed? pip3 install --user Cython should do the trick. To cover all of your bases, try pip3 install --user --upgrade Cython setuptools wheel. Then you can try to install pyabpoa using pip. If that doesn't work, you can clone the abPOA repo and run make install_py

@OscarT32
Copy link
Author

OscarT32 commented Apr 3, 2021

Thanks for the suggestions. Installation worked properly! I have started running some data that I ran on previous versions but I am having some issues.

When I use -q to filter the input file this warning is displayed:
image

When I remove -q, C3POa starts running but the ran finishes only after a few minutes (this is really fast compared to the previous version in which the same data set takes a few hours). When I checked the output, the "R2C2_consensus. fasta" file is really small with only a few sequences. The log file shows that only a few sequences are actually filtered compared to the total number of sequences (I have filtered sequences by size previously):

image

This is the command line I am using to run C3POa

image

Once again thank you for your assistance

@rvolden
Copy link
Owner

rvolden commented Apr 3, 2021

Can you follow the debug step seen here: #17 (comment)

For some reason python multiprocessing doesn't like passing back errors, so it'll just die silently instead of complaining.

@OscarT32
Copy link
Author

OscarT32 commented Apr 3, 2021

I followed the debug step. The following error was displayed:
image

@rvolden
Copy link
Owner

rvolden commented Apr 5, 2021

Seems to be a problem with pyabpoa, can you verify that your install is working correctly? It may have installed but it could still run into runtime errors

@OscarT32
Copy link
Author

Thanks for your help. Indeed the problem was with pyabpoa install. Now C3POa is running properly but when I try to use -q 9 it says unrecognized argument. When I use -h, the -q argument is not available. When I do not include it C3POa runs without any issues.
image

@rvolden
Copy link
Owner

rvolden commented Apr 16, 2021

Yeah, we took out that option since ONT qscores are mostly nonsensical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants