Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate SV IDs with identical chrom/start/end positions in AnnotSV TSV output #267

Open
poddarharsh15 opened this issue Jan 7, 2025 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@poddarharsh15
Copy link

Hi, @lgmgeo

Thank you for developing and maintaining AnnotSV—it’s an excellent tool for structural variant annotation. However, I encountered an issue while processing my data.

After running AnnotSV, the resulting .tsv file contains a large number of SVs with the same ID and identical chromosome start and end positions. I’m not sure why this duplication occurs, as I expected each SV to be uniquely identified in the output.

Could you please help clarify why this is happening? I’ve attached the TSV file and the exact command/code I used for reference.

AnnotSV -annotationsDir ${ANNOTSV_DIR} \
        -annotationMode both \
        -includeCI 0 \
        -overlap 100 \
        -overwrite 1 \
	-hpo HP:0001561,HP:0001276,HP:0002371,HP:0025313,HP:0033725,HP:0002197 \
        -genomeBuild GRCh38 \
        -tx ENSEMBL \
        -SVinputfile ${INPUT_VCF} \
        -outputFile ${OUTPUT_VCF} \
        -outputDir ${OUTPUT_DIR} \
	-variantconvertDir /home/tigem/h.poddar/structural_varinats/variantconvert \
	-vcf 1

BA013_P_1.zip

@lgmgeo
Copy link
Owner

lgmgeo commented Jan 9, 2025

Please, look at the README.

Full and split lines:

  • Annotation on the “full” length of the SV. Every SV are reported, even those not covering a gene. This type
    of annotation gives an estimate of the SV itself.
  • Annotation of the SV “split” by gene. This type of annotation gives an opportunity to focus on each gene
    overlapped by the SV. Thus, when a SV spans over several genes, the output will contain as many annotations lines as genes covered. This latter annotation is extremely powerful to shorten the identification of mutation implicated in a specific gene.

image

@lgmgeo lgmgeo added the help wanted Extra attention is needed label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants