-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGV hg38 fix #63
Comments
seems like almost all paths in .xml are generated wrongly (that includes the change in the structure of the folders). Is the ClinSV project abandoned? |
Folder structures have been made for a kind of IGV server that should be run from docker ClinSV (but was cropped from v1.1-dev). I used to solve this problem by running lscr.io/linuxserver/webtop:ubuntu-kde with installed desktop IGV and making a mount point and paths according to paths from ClinSV's XML, and it's working. |
First of, thank you @akiss-me for sharing your fix. I've been working on merging some changes that should hopefully cleanly address the IGV issue. As @akiss-me mentioned, using the v1.1-dev version can handle those naming conventions for you. As for the XML files, if you are using the docker images all the non-resource paths (such as the ref genome), are file paths used in the docker container. Therefore, if you wanted to use the XML outside of the container in an IGV session you are going to have the change the paths in the XML file to the respective paths as is on your machine. Which are the same paths you mounted onto the docker container with the "-v" commands. In future I hope to implement this as a script that you can run outside the container. |
Hi @J-Bradlee, I'm happy that the project is sitll alive!
Then we have for example:
Are we sure these files are also not misslabelled:
.xml is easy to fix anyway. However can we be sure, that in the code and during the data processing, the right paths/files are used? Are the all CNVs calculated on the right coordinates system? At the end, it is the most important... Anyway, you do a great job, I'm impressed how complex the application is. ps. do you plan the handle chromosomes "1, 2, 3, 4, MT, X, Y," in hg38 (in addtion to chr1, chr2, chrM...)? I would be super great! |
Hi @thedam, Those resource paths are like that because you have run clinSV in the docker container. In the container those resource files are at those locations. However, on your own machine they will be whatever you set these environment variables to: Recall, you attached the volumes to the docker container with the "-v" command: A quick fix would be to run a sed command to on the result XML to remove the "app/ref-data" . These resource files are used throughout ClinSV accurately . I have made a feature to handle the hg38 / hg19 chormosomes and is already dockerized for you to use. Albeit, I am currently trying to fix a bug which has broken "coverage by chromosome" part of the output report. See here and here |
Yes, I've run the job with the exact Docker command. Also the the pipeline doesn't work with GRCh38 chromosome naming convention (1, 2, 3, 4 instead of chr1, chr2 chr3...) When the input is .bam file with chromosmes 1, 2, 3... the error appers:
The error appears because somewhere in ClinSV code, there is samtools command with I quess it could be fixed by providing the reference, that .bam is aligned to and it should be named as All files in |
This is probably because you are using DIY genome reference. It's okay, but for cross compatibility, researchers and medgenetics usually use GATK genome bundle, which comes from the reference with "chr" at the beginning and is kind of standard. You can add chr to the chromosome names in the bam file (https://www.biostars.org/p/13462/), but it could lead to other problems caused by unassembled contigs naming. |
Both conventions are used (unfortunately). The main service that doesn't use prefixes is VEP:( |
Hi,
I thought v1.1 supported bam files aligned to hg19 and b38 and allowed the
contigs to be named 1, 2 3 or chr1, chr2, chr3 etc; James can you
investigate that?
Most ClinSV ref files were recreated for hg38, at least the large ones that
were under our control, like the four different noise files.
v1.0.1 included a fix for the b38 reference files in the igv xml file,
which is not in v1.1.
Altering the remaining file paths that otherwise point to locations within
a docker container needs another change that I know James is working on
now.
So, look forward to v1.1.1 which will bring in all those changes.
Cheers
…On Fri, 26 Apr 2024, 1:50 am akiss-me, ***@***.***> wrote:
Yes, I've run the job with the exact Docker command. What I mean, is that
in the *b38* result .xml file the paths for *b37* apperas.
Also the the pipeline doesn't work with GRCh38 chromosome naming
convention (1, 2, 3, 4 instead of chr1, chr2 chr3...)
When the input is .bam file with chromosmes 1, 2, 3... the error appers:
### executing: sh /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh &> /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e ...
### finished after (hh:mm:ss): 00:00:01
### exist status: 256
***** error exist status != 0 (256), please check /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e for more information
The error appears because somewhere in ClinSV code, there is samtools
command with chr: samtools view
/app/project_folder/alignments/273858DL/273858DL.bam chr1:1000000-1100000 [main_samview]
region "chr1:1000000-1100000" specifies an invalid region or unknown
reference. Continue anyway
I quess it could be fixed by providing the reference, that .bam is aligned
to and it should be named as Homo_sapiens_assembly38.fasta (as it is
hardcoded in clinsv.py, so in the code in the Docker).
All files in refdata-b38 have also naming convention with 'chr1, chr2,
chr3' so they should be also changed?
This is probably because you are using DIY genome reference. It's okay,
but for cross compatibility, researchers and medgenetics usually use GATK
genome bundle, which comes from the reference with "chr" at the beginning
and is kind of standard. You can add chr to the chromosome names in the bam
file (https://www.biostars.org/p/13462/), but it could lead to other
problems caused by unassembled contigs naming.
—
Reply to this email directly, view it on GitHub
<#63 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQQM7LCDX2DU4AIPYM72TY7EQ35AVCNFSM6AAAAABFJRELDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXGYYTSNZXGU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi @thedam, Can I confirm what docker image you are using/what version of ClinSV you are running? |
Hi, I use: I have it form Docker Hub,
|
Dear all,
I fixed IGV hg38 issue, the solution is not very clean, since I did not rewrite the code, but simply replaced b37 with hg38 and the corresponding tracks.
I didn’t make changes to the working image or fork it, so for now I suggest just mounting the corrected version of the main script (please unzip first).
this is how to run:
enjoy!
clinsv_hg38.zip
The text was updated successfully, but these errors were encountered: