Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with sample names #7

Open
mldmort opened this issue Oct 21, 2020 · 7 comments
Open

Issue with sample names #7

mldmort opened this issue Oct 21, 2020 · 7 comments

Comments

@mldmort
Copy link

mldmort commented Oct 21, 2020

Hi,

I'm running SequelTools for 8 CLR samples. I'm giving the sample names with -u subfiles.txt option. In the subfiles.txt file I put the address of the bam files. This is my command:
SequelTools.sh -t Q -u subFiles.txt -n 12 -p a -g a -o $OUT_DIR
I am getting weird plots for my stats with the same name for each bam file. A sample plot is attached. Also the summaryTable.txt looks like this with the same number for all samples:

SMRTcell	numReadsSubread	numReadsLongestSub	totalBasesSubread	totalBasesLongestSub	meanReadLenSubread	meanReadLenLongestSub	medianReadLenSubread	medianReadLenLongestSub	n50Subread	n50LongestSub	l50Subread	l50LongestSub	PSR	ZOR
oasis	1320271	181528	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.137
oasis	2578421	377887	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.147
oasis	2252172	320325	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.142
oasis	2320629	335461	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.145
oasis	2266229	324966	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.143
oasis	2165289	302979	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.140
oasis	4398328	638727	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.145
oasis	2499748	348122	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.139

Would you let me know what's wrong?
Thanks
n50s.pdf

@DavidEHufnagel
Copy link
Collaborator

DavidEHufnagel commented Oct 21, 2020 via email

@mldmort
Copy link
Author

mldmort commented Oct 21, 2020

Hi,

my Subfiles.txt contain:

/projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam

I thought that the names come from the bam files but it doesn't seems to. The name oasis appears in the output directory in the -o option:
-o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults

I don't know why oasis is chosen for the name of all the files and why the stats of the last file is chosen for all the cases.
So I checked and it turns out that the stats in summaryTable.txt for all samples correspond to the last file.

Any idea why it happens?
Thank,

@DavidEHufnagel
Copy link
Collaborator

DavidEHufnagel commented Oct 21, 2020 via email

@aseetharam
Copy link
Collaborator

@mldmort from first glance, it looks like the -- in the file name is causing something unintended, can you please try it one more time renaming the bam files without double dash?

@DavidEHufnagel
Copy link
Collaborator

DavidEHufnagel commented Oct 22, 2020 via email

@mldmort
Copy link
Author

mldmort commented Oct 22, 2020

Hi David,

No, I have used symbolic links to point to my bam files to see if it solves the problem. So my new subfiles.txt file looks like:

ACI.bam
BN.bam
BUF.bam
F344.bam
MR.bam
MS20.bam
WKY.bam
WN.bam

And the files link to the original bam files like:

ACI.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam
BN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam
BUF.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam
F344.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam
MR.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam
MS20.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam
WKY.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam
WN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam

I don't know if linking would be sufficient or not but maybe the next step is to change the original file name?
but the name oasis which appears in the plots most probably come from the -o option:

-o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults

That's the only place the name oasis appears.
Also the summaryTable.txt is still flawed with the same numbers for each row:

SMRTcell	numReadsSubread	numReadsLongestSub	totalBasesSubread	totalBasesLongestSub	meanReadLenSubread	meanReadLenLongestSub	medianReadLenSubread	medianReadLenLongestSub	n50Subread	n50LongestSub	l50Subread	l50LongestSub	PSR	ZOR
oasis	1320271	181528	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.137
oasis	2320629	335461	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.145
oasis	2252172	320325	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.142
oasis	2165289	302979	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.140
oasis	2578421	377887	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.147
oasis	2266229	324966	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.143
oasis	4398328	638727	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.145
oasis	2499748	348122	21082583975	3794848484	8434	10901	8317	9856	9304	11125	885174	122214	0.180	0.139

Any suggestions?
Thanks,

@DavidEHufnagel
Copy link
Collaborator

DavidEHufnagel commented Oct 22, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants