IGV hg38 fix #63

akiss-me · 2024-03-26T19:11:13Z

Dear all,

I fixed IGV hg38 issue, the solution is not very clean, since I did not rewrite the code, but simply replaced b37 with hg38 and the corresponding tracks.
I didn’t make changes to the working image or fork it, so for now I suggest just mounting the corrected version of the main script (please unzip first).

this is how to run:

docker run --rm \
--mount type=bind,source=/**your_path**/clinsv_hg38,target=/app/clinsv/bin/clinsv \
--name clinsv \
-v /your_path/ \
--entrypoint "perl" mrbradley2/clinsv:v1.1-dev /app/clinsv/bin/clinsv \
-r all \
-p "/**your_path**/clinsv_results/" \
-i "/**your_path**/*.bam" \
-ref "/**your_path**/refdata-b38/refdata-b38"

enjoy!

clinsv_hg38.zip

The text was updated successfully, but these errors were encountered:

thedam · 2024-04-23T08:32:21Z

seems like almost all paths in .xml are generated wrongly (that includes the change in the structure of the folders).

Is the ClinSV project abandoned?
The biggest pain for me is that needs "chr1, chr2..." naming convention insead of "1, 2, 3..."

akiss-me · 2024-04-23T19:51:17Z

seems like almost all paths in .xml are generated wrongly (that includes the change in the structure of the folders).

Is the ClinSV project abandoned? The biggest pain for me is that needs "chr1, chr2..." naming convention insead of "1, 2, 3..."

Folder structures have been made for a kind of IGV server that should be run from docker ClinSV (but was cropped from v1.1-dev). I used to solve this problem by running lscr.io/linuxserver/webtop:ubuntu-kde with installed desktop IGV and making a mount point and paths according to paths from ClinSV's XML, and it's working.

J-Bradlee · 2024-04-24T00:35:50Z

First of, thank you @akiss-me for sharing your fix. I've been working on merging some changes that should hopefully cleanly address the IGV issue. As @akiss-me mentioned, using the v1.1-dev version can handle those naming conventions for you. As for the XML files, if you are using the docker images all the non-resource paths (such as the ref genome), are file paths used in the docker container.

Therefore, if you wanted to use the XML outside of the container in an IGV session you are going to have the change the paths in the XML file to the respective paths as is on your machine. Which are the same paths you mounted onto the docker container with the "-v" commands.

In future I hope to implement this as a script that you can run outside the container.

thedam · 2024-04-24T07:46:13Z

Hi @J-Bradlee, I'm happy that the project is sitll alive!
I have some concerns; looking only at the data in the .xml file generated for hg38, we can see:

<Session genome="b37" hasGeneTrack="true" hasSequenceTrack="true" locus="1:197493130-197514036" version="8">
here should be genome="hg38" as @akiss-me noted.

Then we have for example:

<Resource path="/app/ref-data/refdata-b38/tracks/GRCh37GenomicSuperDup.bed.gz"/>
I believe it should be /refdata-b38/tracks/GRCh38GenomicSuperDup.bed.gz

<Resource path="/app/ref-data/refdata-b38/tracks/DGV_1kG_2015-07-18.igv.bed.gz"/>
Should it be this file: /refdata-b38/tracks/DGV_GRCh38_hg38_variants_2020-02-25.igv.bed.gz?

<Resource path="/app/ref-data/refdata-b38/tracks/GRCh37_hg19_variants_2015-07-23.igv.bed.gz"/>
no such file for hg38?

<Resource path="/app/ref-data/refdata-b38/annotation/Homo_sapiens.GRCh37.75.gff.gz"/>
this: /refdata-b38/annotation/Homo_sapiens.GRCh38.99.gff.gz

Are we sure these files are also not misslabelled:

<Resource path="/app/ref-data/refdata-b38/tracks/MGRB-SV.igv.bed.gz"/>
<Resource path="/app/ref-data/refdata-b38/tracks/popCovStdev.bw"/>

.xml is easy to fix anyway. However can we be sure, that in the code and during the data processing, the right paths/files are used? Are the all CNVs calculated on the right coordinates system? At the end, it is the most important...

Anyway, you do a great job, I'm impressed how complex the application is.

ps. do you plan the handle chromosomes "1, 2, 3, 4, MT, X, Y," in hg38 (in addtion to chr1, chr2, chrM...)? I would be super great!

J-Bradlee · 2024-04-24T08:42:09Z

Hi @thedam,

Those resource paths are like that because you have run clinSV in the docker container. In the container those resource files are at those locations. However, on your own machine they will be whatever you set these environment variables to:
refdata_path=$PWD/clinsv/refdata-b38 input_path=$PWD project_folder=$PWD/test_run

Recall, you attached the volumes to the docker container with the "-v" command:
docker run -v $refdata_path:/app/ref-data \ -v $project_folder:/app/project_folder \ -v $input_path:/app/input \ --entrypoint "perl" mrbradley2/clinsv:v1.0 /app/clinsv/bin/clinsv \ -r all \ -p /app/project_folder/ \ -i "/app/input/*.bam" \ -ref /app/ref-data/refdata-b38 \

A quick fix would be to run a sed command to on the result XML to remove the "app/ref-data" .

These resource files are used throughout ClinSV accurately .

I have made a feature to handle the hg38 / hg19 chormosomes and is already dockerized for you to use. Albeit, I am currently trying to fix a bug which has broken "coverage by chromosome" part of the output report. See here and here

thedam · 2024-04-25T07:44:58Z

Yes, I've run the job with the exact Docker command.
What I mean, is that in the b38 result .xml file the paths for b37 apperas.

Also the the pipeline doesn't work with GRCh38 chromosome naming convention (1, 2, 3, 4 instead of chr1, chr2 chr3...)

When the input is .bam file with chromosmes 1, 2, 3... the error appers:

### executing: sh /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh &> /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e  ...

 ### finished after (hh:mm:ss): 00:00:01
 ### exist status: 256



 ***** error exist status != 0 (256), please check /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e for more information

The error appears because somewhere in ClinSV code, there is samtools command with chr:
samtools view /app/project_folder/alignments/273858DL/273858DL.bam chr1:1000000-1100000
[main_samview] region "chr1:1000000-1100000" specifies an invalid region or unknown reference. Continue anyway

I quess it could be fixed by providing the reference, that .bam is aligned to and it should be named as Homo_sapiens_assembly38.fasta (as it is hardcoded in clinsv.py, so in the code in the Docker).

All files in refdata-b38 have also naming convention with 'chr1, chr2, chr3' so they should be also changed?

akiss-me · 2024-04-25T15:50:00Z

Yes, I've run the job with the exact Docker command. What I mean, is that in the b38 result .xml file the paths for b37 apperas.

Also the the pipeline doesn't work with GRCh38 chromosome naming convention (1, 2, 3, 4 instead of chr1, chr2 chr3...)

When the input is .bam file with chromosmes 1, 2, 3... the error appers:
### executing: sh /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh &> /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e  ...

 ### finished after (hh:mm:ss): 00:00:01
 ### exist status: 256



 ***** error exist status != 0 (256), please check /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e for more information
The error appears because somewhere in ClinSV code, there is samtools command with chr: samtools view /app/project_folder/alignments/273858DL/273858DL.bam chr1:1000000-1100000 [main_samview] region "chr1:1000000-1100000" specifies an invalid region or unknown reference. Continue anyway

I quess it could be fixed by providing the reference, that .bam is aligned to and it should be named as Homo_sapiens_assembly38.fasta (as it is hardcoded in clinsv.py, so in the code in the Docker).

All files in refdata-b38 have also naming convention with 'chr1, chr2, chr3' so they should be also changed?

This is probably because you are using DIY genome reference. It's okay, but for cross compatibility, researchers and medgenetics usually use GATK genome bundle, which comes from the reference with "chr" at the beginning and is kind of standard. You can add chr to the chromosome names in the bam file (https://www.biostars.org/p/13462/), but it could lead to other problems caused by unassembled contigs naming.

thedam · 2024-04-26T07:13:11Z

Both conventions are used (unfortunately). The main service that doesn't use prefixes is VEP:(

drmjc · 2024-04-27T01:22:31Z

Hi, I thought v1.1 supported bam files aligned to hg19 and b38 and allowed the contigs to be named 1, 2 3 or chr1, chr2, chr3 etc; James can you investigate that? Most ClinSV ref files were recreated for hg38, at least the large ones that were under our control, like the four different noise files. v1.0.1 included a fix for the b38 reference files in the igv xml file, which is not in v1.1. Altering the remaining file paths that otherwise point to locations within a docker container needs another change that I know James is working on now. So, look forward to v1.1.1 which will bring in all those changes. Cheers

…

On Fri, 26 Apr 2024, 1:50 am akiss-me, ***@***.***> wrote: Yes, I've run the job with the exact Docker command. What I mean, is that in the *b38* result .xml file the paths for *b37* apperas. Also the the pipeline doesn't work with GRCh38 chromosome naming convention (1, 2, 3, 4 instead of chr1, chr2 chr3...) When the input is .bam file with chromosmes 1, 2, 3... the error appers: ### executing: sh /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh &> /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e ... ### finished after (hh:mm:ss): 00:00:01 ### exist status: 256 ***** error exist status != 0 (256), please check /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e for more information The error appears because somewhere in ClinSV code, there is samtools command with chr: samtools view /app/project_folder/alignments/273858DL/273858DL.bam chr1:1000000-1100000 [main_samview] region "chr1:1000000-1100000" specifies an invalid region or unknown reference. Continue anyway I quess it could be fixed by providing the reference, that .bam is aligned to and it should be named as Homo_sapiens_assembly38.fasta (as it is hardcoded in clinsv.py, so in the code in the Docker). All files in refdata-b38 have also naming convention with 'chr1, chr2, chr3' so they should be also changed? This is probably because you are using DIY genome reference. It's okay, but for cross compatibility, researchers and medgenetics usually use GATK genome bundle, which comes from the reference with "chr" at the beginning and is kind of standard. You can add chr to the chromosome names in the bam file (https://www.biostars.org/p/13462/), but it could lead to other problems caused by unassembled contigs naming. — Reply to this email directly, view it on GitHub <#63 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEQQM7LCDX2DU4AIPYM72TY7EQ35AVCNFSM6AAAAABFJRELDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXGYYTSNZXGU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

J-Bradlee · 2024-04-29T23:56:15Z

Hi @thedam,

Can I confirm what docker image you are using/what version of ClinSV you are running?

thedam · 2024-04-30T11:30:59Z

Hi,

I use: mrbradley2/clinsv v1.1-dev **ad470d0fa5d2** 19 months ago 7.41GB

I have it form Docker Hub,
docker pull mrbradley2/clinsv:v1.1-dev
however I see it is the same image id as here:
docker pull containerregistrypubliccb.azurecr.io/clinsv:v1.1-dev

containerregistrypubliccb.azurecr.io/clinsv       v1.1-dev   ad470d0fa5d2   19 months ago   7.41GB
mrbradley2/clinsv                                 v1.1-dev   ad470d0fa5d2   19 months ago   7.41GB

akiss-me changed the title ~~IGV fix~~ IGV hg38 fix Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGV hg38 fix #63

IGV hg38 fix #63

akiss-me commented Mar 26, 2024 •

edited

Loading

thedam commented Apr 23, 2024

akiss-me commented Apr 23, 2024

J-Bradlee commented Apr 24, 2024

thedam commented Apr 24, 2024

J-Bradlee commented Apr 24, 2024

thedam commented Apr 25, 2024

akiss-me commented Apr 25, 2024

thedam commented Apr 26, 2024

drmjc commented Apr 27, 2024 via email

J-Bradlee commented Apr 29, 2024

thedam commented Apr 30, 2024

IGV hg38 fix #63

IGV hg38 fix #63

Comments

akiss-me commented Mar 26, 2024 • edited Loading

thedam commented Apr 23, 2024

akiss-me commented Apr 23, 2024

J-Bradlee commented Apr 24, 2024

thedam commented Apr 24, 2024

J-Bradlee commented Apr 24, 2024

thedam commented Apr 25, 2024

akiss-me commented Apr 25, 2024

thedam commented Apr 26, 2024

drmjc commented Apr 27, 2024 via email

J-Bradlee commented Apr 29, 2024

thedam commented Apr 30, 2024

akiss-me commented Mar 26, 2024 •

edited

Loading