Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample names with spaces are not supported #199

Open
ernfrid opened this issue Jul 21, 2017 · 6 comments
Open

Sample names with spaces are not supported #199

ernfrid opened this issue Jul 21, 2017 · 6 comments

Comments

@ernfrid
Copy link
Contributor

ernfrid commented Jul 21, 2017

I have unfortunately encountered a CRAM file with an SM field within the @rg lines containing a space. I believe this is valid according to the BAM specification.

When running lumpyexpress on this file the resulting VCF truncates the sample name at the space.
For example, the line:

@RG	ID:2895621816	CN:WUGSC	LB:lib1	PL:ILLUMINA	PU:XXXXXX.X	SM:SAMPLE -50

would result in a VCF containing the sample SAMPLE instead of SAMPLE -50.

@ryanlayer
Copy link
Collaborator

ryanlayer commented Jul 24, 2017 via email

@ernfrid
Copy link
Contributor Author

ernfrid commented Jul 27, 2017

I think converting a space to an underscore would just put me in the same position I'm in now, a different sample name in the output file and poor interaction with tools that handle the sample name properly.

Unfortunately, this is protected access data and thus, I can't share.

@ryanlayer
Copy link
Collaborator

Can you send me the command that lumpyexpress tries to run? I was thinking that it may work if the -pe and -sr options could be encased in the double quotes.

@ernfrid
Copy link
Contributor Author

ernfrid commented Aug 15, 2017

Here's what is printed in the log file.

/opt/lumpy-sv//bin/lumpy -P \
    -t H_TK-D240629_-50-D240629_-50.lumpy.temp/H_TK-D240629_-50-D240629_-50.vcf.tmp \
    -msw 4 \
    -tt 0 \
     \
    -x /gscmnt/gc2802/halllab/ccdg_resources/genomes/human/GRCh38DH/annotations/exclude.cnvnator_100bp.GRCh38.20170403.bed \
     -pe bam_file:H_TK-D240629_-50-D240629_-50.discordants.bam,histo_file:H_TK-D240629_-50-D240629_-50.lumpy.temp/H_TK-D240629_-50-D240629_-50.vcf.tmp.sample1.lib1.x4.histo,mean:364.530025182,stdev:80.5523897695,read_length:151,min_non_overlap:151,discordant_z:5,back_distance:10,weight:1,id:D240629,min_mapping_threshold:20,read_group:2895621640,read_group:2895621762,read_group:2895621763,read_group:2895621774,read_group:2895621775,read_group:2895621809,read_group:2895621810,read_group:2895621816 \
     -sr bam_file:H_TK-D240629_-50-D240629_-50.splitters.bam,back_distance:10,min_mapping_threshold:20,weight:1,id:D240629,min_clip:20 \
    > H_TK-D240629_-50-D240629_-50.vcf.tmp

@ernfrid
Copy link
Contributor Author

ernfrid commented Aug 15, 2017

I don't have the temp directory preserved, but I can rerun if that would help.

@ernfrid
Copy link
Contributor Author

ernfrid commented Aug 31, 2017

@ryanlayer - I think I may have a solution to this (#208) but I've not tested extensively and my bash is pretty weak. Let me know if you see any issues with what I put together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants