-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with sample names #7
Comments
Hello,
Thank you for using SequelTools! Subfiles.txt should be a
file-of-filenames, which it sounds like it is in your case. These
filenames are what determines the name of each SMRTcell in the output. Are
your files all named oasis.bam? If so, changing those names to unique
identifiers should resolve the issue. Let me know if that works for you.
Best,
Dr. David E. Hufnagel
…On Tue, Oct 20, 2020 at 7:27 PM mldmort ***@***.***> wrote:
Hi,
I'm running SequelTools for 8 CLR samples. I'm giving the sample names
with -u subfiles.txt option. In the subfiles.txt file I put the address
of the bam files. This is my command:
SequelTools.sh -t Q -u subFiles.txt -n 12 -p a -g a -o $OUT_DIR
I am getting weird plots for my stats with the same name for each bam
file. A sample plot is attached. Also the summaryTable.txt looks like
this with the same number for all samples:
SMRTcell numReadsSubread numReadsLongestSub totalBasesSubread totalBasesLongestSub meanReadLenSubread meanReadLenLongestSub medianReadLenSubread medianReadLenLongestSub n50Subread n50LongestSub l50Subread l50LongestSub PSR ZOR
oasis 1320271 181528 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.137
oasis 2578421 377887 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.147
oasis 2252172 320325 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.142
oasis 2320629 335461 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2266229 324966 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.143
oasis 2165289 302979 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.140
oasis 4398328 638727 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2499748 348122 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.139
Would you let me know what's wrong?
Thanks
n50s.pdf
<https://github.com/ISUgenomics/SequelTools/files/5412390/n50s.pdf>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQPE3LRDVARAAX6TUBYSYLSLYTGRANCNFSM4SZBSWCA>
.
|
Hi, my Subfiles.txt contain:
I thought that the names come from the bam files but it doesn't seems to. The name I don't know why oasis is chosen for the name of all the files and why the stats of the last file is chosen for all the cases. Any idea why it happens? |
Hey Arun,
I hope you can see the whole conversation here. I'm a little perplexed by
this problem. Do you have some ideas as to what's causing these issues?
Let me know,
Best, David
…On Wed, Oct 21, 2020 at 12:28 PM mldmort ***@***.***> wrote:
Hi,
my Subfiles.txt contain:
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam
I thought that the names come from the bam files but it doesn't seems to.
The name oasis appears in the output directory in the -o option:
-o
/oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults
I don't know why oasis is chosen for the name of all the files and why the
stats of the last file is chosen for all the cases.
So I checked and it turns out that the stats in summaryTable.txt for all
samples correspond to the last file.
Any idea why it happens?
Thank,
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQPE3OXPVQ7N2LG3524SD3SL4K5TANCNFSM4SZBSWCA>
.
|
@mldmort from first glance, it looks like the |
Did this resolve the issue mldmort?
…On Wed, Oct 21, 2020 at 2:12 PM Arun Seetharam ***@***.***> wrote:
@mldmort <https://github.com/mldmort> from first glance, it looks like
the -- in the file name is causing something unintended, can you please
try it one more time renaming the bam files without double dash?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQPE3PNCQDWGBR6AEQ47ZLSL4W75ANCNFSM4SZBSWCA>
.
|
Hi David, No, I have used symbolic links to point to my bam files to see if it solves the problem. So my new subfiles.txt file looks like:
And the files link to the original bam files like:
I don't know if linking would be sufficient or not but maybe the next step is to change the original file name?
That's the only place the name
Any suggestions? |
Yes, I believe you will have to change the original names. I am doing
additional testing for a demonstration of SequelTools I will be doing
next week and unfortunately I'm finding that the required format for the
names of the input files is quite rigid. It has to be something like this,
"ID.scraps.bam" or "ID.subreads.bam", where ID is usually something like
this, "
m54138_180610_050652". That has been the structure of all the files I've
seen come directly from PacBio sequencing machines. This software was
published just this month and we are getting lots of feedback now on issues
we did not come across before. You can expect updates coming in the next
few weeks to make SequelTools more flexible and to resolve identified bugs
and issues.
Best, David
…On Thu, Oct 22, 2020 at 11:32 AM mldmort ***@***.***> wrote:
Hi David,
No, I have used symbolic links to point to my bam files to see if it
solves the problem. So my new subfiles.txt file looks like:
ACI.bam
BN.bam
BUF.bam
F344.bam
MR.bam
MS20.bam
WKY.bam
WN.bam
And the files link to the original bam files like:
ACI.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam
BN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam
BUF.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam
F344.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam
MR.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam
MS20.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam
WKY.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam
WN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam
I don't know if linking would be sufficient or not but maybe the next step
is to change the original file name?
but the name oasis which appears in the plots most probably come from the
-o option:
-o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults
That's the only place the name oasis appears.
Also the summaryTable.txt is still flawed with the same numbers for each
row:
SMRTcell numReadsSubread numReadsLongestSub totalBasesSubread totalBasesLongestSub meanReadLenSubread meanReadLenLongestSub medianReadLenSubread medianReadLenLongestSub n50Subread n50LongestSub l50Subread l50LongestSub PSR ZOR
oasis 1320271 181528 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.137
oasis 2320629 335461 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2252172 320325 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.142
oasis 2165289 302979 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.140
oasis 2578421 377887 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.147
oasis 2266229 324966 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.143
oasis 4398328 638727 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2499748 348122 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.139
Any suggestions?
Thanks,
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQPE3MKL7GTM3ROYLACJCTSMBNCNANCNFSM4SZBSWCA>
.
|
Hi,
I'm running SequelTools for 8 CLR samples. I'm giving the sample names with
-u subfiles.txt
option. In the subfiles.txt file I put the address of the bam files. This is my command:SequelTools.sh -t Q -u subFiles.txt -n 12 -p a -g a -o $OUT_DIR
I am getting weird plots for my stats with the same name for each bam file. A sample plot is attached. Also the
summaryTable.txt
looks like this with the same number for all samples:Would you let me know what's wrong?
Thanks
n50s.pdf
The text was updated successfully, but these errors were encountered: