Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Add WDL Workflow for Automated IGV Screenshots with Customizable Resources and Inputs #472

Draft
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

shadizaheri
Copy link
Collaborator

@shadizaheri shadizaheri commented Sep 18, 2024

Overview
This PR introduces a new WDL workflow to automate the generation of IGV (Integrative Genomics Viewer) screenshots. The workflow allows users to input multiple BAM files, VCF files, and optional haplotype-specific alignments for visualization. Additionally, it includes options to customize memory, disk size, and snapshot height, ensuring flexibility for various use cases.

Key Features

  • Multi-file Support: Handles multiple BAM files, including haplotypes (truth_haplotype_1, truth_haplotype_2, 8x-hap1, 8x-hap2), and a targeted VCF (TRGT_VCF) for comprehensive visualization.

  • Customizable Resource Allocation: Users can specify memory, disk size, and image height to optimize the workflow for their dataset size and computational environment.

  • Optional Inputs: Some inputs such as haplotype BAM files and VCF are optional, providing flexibility in the use of this workflow across different projects.

  • Direct Screenshot Outputs: Outputs IGV screenshots directly as an array of PNG files, making it easy to review and download individual images.

Changes Included

  • New WDL Workflow (IGVScreenshotWorkflow): A workflow to automate IGV screenshot generation using a Python script.

  • Task (RunIGVScreenshot): A task that executes the IGV Python script within a Docker container, with support for a virtual display (Xvfb).

  • Updated Documentation: Includes documentation in the WDL script header for easy understanding and usage.

Usage

  1. Inputs: Users can provide the following inputs:
  • BAM and BAI files (aligned_bam1, aligned_bam2)
  • BED file for regions of interest
  • Reference FASTA and FAI
  • Optional haplotype BAMs (truth_haplotype_1, 8x-hap1, etc.)
  • Optional targeted VCF and its index (TRGT_VCF, TRGT_VCF_tbi)
  • Adjustable parameters: image_height, memory_mb, disk_gb
  1. Execution: The workflow uses the provided inputs to generate IGV screenshots, saved as individual PNG files.
  2. Output: The output is an array of screenshots directly accessible without compression.

How to Test

  • Run the Workflow: Execute the WDL workflow on Terra or a WDL-compatible execution environment.

  • Check Outputs: Verify that the screenshots are generated correctly in the specified output directory, and ensure optional files are handled as expected.

  • Resource Customization: Test the workflow with different image_height, memory_mb, and disk_gb values to confirm customization works.

Potential Use Cases

  • Visualizing genomic regions for multiple samples or haplotypes.
  • Creating IGV snapshots for variant analysis and presentation.
  • Generating high-resolution images for research publications.
    Additional Notes
    The Docker image used in this workflow is us.gcr.io/broad-dsp-lrma/igv_screenshot_docker:v9172024.
    This workflow is designed for flexibility and can be adapted to various genomic visualization needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant