Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError during extract_barcodes.py #85

Open
ajdesalvio opened this issue Nov 12, 2024 · 3 comments · May be fixed by #86
Open

ValueError during extract_barcodes.py #85

ajdesalvio opened this issue Nov 12, 2024 · 3 comments · May be fixed by #86

Comments

@ajdesalvio
Copy link

Using BLR version 0.5.dev21+gc0d6996, I attempted to process a TELL-Seq sample using the Solanum tuberosum (potato) (link) genome as a reference, indexed with bwa. I recently used BLR with a Gossypium hirsutum (cotton) sample and had success. The only parameters that were changed between the two runs were the names of the chromosomes and the paths to the R1, I1, and genome reference files. I can upload the MultiQC report if that would be useful.

The potato sample failed at step 566 of 573 due to an apparent error within the extract_barcodes.py script stemming from a negative start coordinate (-1). I am pasting an excerpt from the error log below. Interestingly, even in the successful BLR run, I also received a syntax warning: SyntaxWarning: invalid escape sequence '\d', but the cotton sample had an exit code of 0. Since the parameters did not change between runs, I'm not sure what to modify to avoid this issue with the potato sample. If anyone has thoughts, please let me know. Happy to provide more details. Thank you!

[Wed Nov  6 17:03:29 2024]
checkpoint extract_barcodes:
    input: mtglink.gfa, final.barcodes.bam, final.barcodes.bam.bai
    output: mtglink_tmp/read_subsampling_pre
    log: mtglink_tmp/read_subsampling_pre.log
    jobid: 562
Downstream jobs will be updated after completion.

python -c "import sys; print('.'.join(map(str, sys.version_info[:2])))"
Activating conda environment: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/conda/8a51ab27e52012a0ea3e4fb4b2c72341
python /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py
Activating conda environment: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/conda/8a51ab27e52012a0ea3e4fb4b2c72341
/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/snakemake/workflow.py:1186: SyntaxWarning: invalid escape sequence '\d'
  self.global_wildcard_constraints(scatteritem="\d+-of-\d+")
/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/snakemake/sourcecache.py:22: SyntaxWarning: invalid escape sequence '\d'
  "https://raw.githubusercontent.com/snakemake/snakemake-wrappers/\d+\.\d+.\d+"
Traceback (most recent call last):
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py", line 366, in <module>
    main(outdir, gfa, bam, flanksize, minbarcocc)
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py", line 325, in main
    extractBarcodesFromChunkRegions(
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py", line 275, in extractBarcodesFromChunkRegions
    extractBarcodesWithPysam(
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py", line 194, in extractBarcodesWithPysam
    for read in reader.fetch(region=region):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pysam/libcalignmentfile.pyx", line 1092, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-1)
[Wed Nov  6 17:06:38 2024]
Error in rule extract_barcodes:
    jobid: 562
    output: mtglink_tmp/read_subsampling_pre
    log: mtglink_tmp/read_subsampling_pre.log (check log file(s) for error message)
    conda-env: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/conda/8a51ab27e52012a0ea3e4fb4b2c72341

RuleException:
CalledProcessError in line 274 of /scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/blr/rules/call_svs.smk:
Command 'source /sw/eb/sw/Mamba/23.1.0-4/bin/activate '/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/conda/8a51ab27e52012a0ea3e4fb4b2c72341'; set -euo pipefail;  python /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py' returned non-zero exit status 1.
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 2380, in run_wrapper
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/blr/rules/call_svs.smk", line 274, in __rule_extract_barcodes
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 574, in _callback
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 560, in cached_or_run
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 2392, in run_wrapper
Removing output files of failed job extract_barcodes since they might be corrupted:
mtglink_tmp/read_subsampling_pre
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/log/2024-10-25T135845.709893.snakemake.log
Traceback (most recent call last):
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/weakref.py", line 624, in _exitfunc
    f()
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/weakref.py", line 548, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/tempfile.py", line 799, in _cleanup
    _shutil.rmtree(name)
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/shutil.py", line 477, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/shutil.py", line 475, in rmtree
    orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/job.11828134/tmpsfr14eqpsnakemake-runtime-source-cache'
SETTINGS FOR: run (version: 0.5.dev21+gc0d6996)
 cores: 48
 anew: False
 no_use_conda: False
 snakemake_args: []
Error in snakemake invocation: Command '['snakemake', '-s', '/scratch/user/ajd/.conda/envs/blr_ktz1_v3/lib/python3.6/site-packages/blr/Snakefile', '--cores', '48', '--printshellcmds', '--use-conda']' returned non-zero exit status 1.
@pontushojer pontushojer linked a pull request Nov 20, 2024 that will close this issue
@pontushojer
Copy link
Collaborator

Great to hear that you had success with the cotton sample at least!

This seems to be the relevant error. A negative start position for the range was supplied causing the error.

  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V3/Outputs/.snakemake/scripts/tmp7q2frrz6.extract_barcodes.py", line 194, in extractBarcodesWithPysam
    for read in reader.fetch(region=region):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pysam/libcalignmentfile.pyx", line 1092, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-1)

This script (src/blr/scripts/extract_barcodes.py) was largely adopted from https://github.com/anne-gcd/MTG-Link so I am less familiar with it unfortunately. I have implemented a fix in #86, you are welcome to download that branch and see if it solves your issue.

You also mentioned something about getting a SyntaxWarning: invalid escape sequence '\d'? Could you provide the full error message?

@ajdesalvio
Copy link
Author

Thanks for looking into the issue! I'll rerun the analysis with the branch.

I'm attaching the entire error log from the failed potato sample. The only mentions of SyntaxWarning: invalid escape sequence '\d' occur on lines 8378 and 8380.

In the successfully completed cotton sample, the same SyntaxWarning occurs on lines 13424 and 13426. Here's the relevant excerpt from that log:

python -c "import sys; print('.'.join(map(str, sys.version_info[:2])))"
Activating conda environment: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/TomF1_V5/Outputs/.snakemake/conda/f1fbce1dfae34c47da6473e0986e3b4a
python /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/TomF1_V5/Outputs/.snakemake/scripts/tmp45nuq4pj.extract_barcodes.py
Activating conda environment: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/TomF1_V5/Outputs/.snakemake/conda/f1fbce1dfae34c47da6473e0986e3b4a
/scratch/user/ajd/.conda/envs/blr_tomf1_v4/lib/python3.6/site-packages/snakemake/workflow.py:1186: SyntaxWarning: invalid escape sequence '\d'
  self.global_wildcard_constraints(scatteritem="\d+-of-\d+")
/scratch/user/ajd/.conda/envs/blr_tomf1_v4/lib/python3.6/site-packages/snakemake/sourcecache.py:22: SyntaxWarning: invalid escape sequence '\d'
  "https://raw.githubusercontent.com/snakemake/snakemake-wrappers/\d+\.\d+.\d+"
Updating job aggregate_extracts.
Updating job mtglink.
[Wed Oct  9 21:31:20 2024]
Finished job 898.
903 of 1438 steps (63%) done
Complete log: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/TomF1_V5/Outputs/.snakemake/log/2024-10-05T130445.635304.snakemake.log
SETTINGS FOR: run (version: 0.5.dev21+gc0d6996)
 cores: 48
 anew: False
 no_use_conda: False
 snakemake_args: []

potato_err.txt
cotton_err.txt

@ajdesalvio
Copy link
Author

Hi Pontus,

I attempted the analysis again using version 0.5.dev22+g8205ff7 (branch fix-85). A similar ValueError: start out of range (-1) occurred again (excerpt pasted below). If you have any thought on how to potentially resolve this, I'd really appreciate it. Thanks again.

tabix -p vcf final.manta_large_insertions.vcf.gz
[Fri Dec 13 12:03:55 2024]
Finished job 565.
565 of 573 steps (99%) done
[Fri Dec 13 12:13:43 2024]
Finished job 557.
566 of 573 steps (99%) done
Select jobs to execute...

[Fri Dec 13 12:13:43 2024]
checkpoint extract_barcodes:
    input: mtglink.gfa, final.barcodes.bam, final.barcodes.bam.bai
    output: mtglink_tmp/read_subsampling_pre
    log: mtglink_tmp/read_subsampling_pre.log
    jobid: 562
Downstream jobs will be updated after completion.

python -c "import sys; print('.'.join(map(str, sys.version_info[:2])))"
Activating conda environment: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/conda/b57916cc3680e142770b06ca0f0bdc88
python /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/scripts/tmplyre3hh0.extract_barcodes.py
Activating conda environment: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/conda/b57916cc3680e142770b06ca0f0bdc88
/scratch/user/ajd/.conda/envs/blr_ktz1_v4/lib/python3.6/site-packages/snakemake/workflow.py:1186: SyntaxWarning: invalid escape sequence '\d'
  self.global_wildcard_constraints(scatteritem="\d+-of-\d+")
/scratch/user/ajd/.conda/envs/blr_ktz1_v4/lib/python3.6/site-packages/snakemake/sourcecache.py:22: SyntaxWarning: invalid escape sequence '\d'
  "https://raw.githubusercontent.com/snakemake/snakemake-wrappers/\d+\.\d+.\d+"
Traceback (most recent call last):
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/scripts/tmplyre3hh0.extract_barcodes.py", line 367, in <module>
    main(outdir, gfa, bam, flanksize, minbarcocc)
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/scripts/tmplyre3hh0.extract_barcodes.py", line 326, in main
    extractBarcodesFromChunkRegions(
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/scripts/tmplyre3hh0.extract_barcodes.py", line 276, in extractBarcodesFromChunkRegions
    extractBarcodesWithPysam(
  File "/scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/scripts/tmplyre3hh0.extract_barcodes.py", line 195, in extractBarcodesWithPysam
    for read in reader.fetch(region=region):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pysam/libcalignmentfile.pyx", line 1092, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-1)
[Fri Dec 13 12:16:23 2024]
Error in rule extract_barcodes:
    jobid: 562
    output: mtglink_tmp/read_subsampling_pre
    log: mtglink_tmp/read_subsampling_pre.log (check log file(s) for error message)
    conda-env: /scratch/user/ajd/Serina/TellSeq2/Outputs_V3/KTZ_1_V4/Outputs/.snakemake/conda/b57916cc3680e142770b06ca0f0bdc88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants