-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variants not inserted from input_variants (VCF) file provided #117
Comments
Hello, sorry about the bugs you encountered. We’re in the process of releasing 4.2.2 with several fixes that I think may address this. We’re planning to finalize the release today or tomorrow. You can wait for the release, or you can try the develop branch in the mean time, and see if it works for you. If it does not work, please let us know and we will dig into the issue.
…-Josh
From: Alyssa Briggs ***@***.***>
Sent: Thursday, June 27, 2024 11:18 AM
To: ncsa/NEAT ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [ncsa/NEAT] Variants not inserted from input_variants (VCF) file provided (Issue #117)
Describe the bug
When trying to simulate desired insertions from VCF file, the insertions are not present in the resulting reads. Golden VCF file is completely empty after successful run (no mutations are added, included the directed insertions). Work being done with @SnehaGummadi<https://urldefense.com/v3/__https:/github.com/SnehaGummadi__;!!DZ3fjg!7S9HBwolCq5iz87tmoM7Fp7qKCFmGANzgXm5w4NTLlxjY9zImpjBekgzGHbpat8W9bBCItb4pD47H-wW9Huqk4rClC-NHw$>
To Reproduce
Using version 4.2.1 main branch
Running on Linux HPC using provided conda environment from install documentation.
Command: neat --log-level DEBUG read-simulator -c testing.yml -o asdf
* debug log 1719503380.1076086_NEAT 1.log<https://urldefense.com/v3/__https:/github.com/user-attachments/files/16017892/1719503380.1076086_NEAT.1.log__;!!DZ3fjg!7S9HBwolCq5iz87tmoM7Fp7qKCFmGANzgXm5w4NTLlxjY9zImpjBekgzGHbpat8W9bBCItb4pD47H-wW9Huqk4rAzRxUSQ$>
* Log never indicates success in "Reading input_VCF: path to vcf"
* config yaml (full paths were provided for reference and input_variants but have been shortened for privacy)
reference: chr18_smallest.fa
read_len: 101
coverage: 3
error_model: .
avg_seq_error: 0.0
rescale_qualities: .
quality_offset: .
ploidy: 1
input_variants: repeats_testing.vcf
target_bed: .
off_target_scalar: .
discard_bed: .
mutation_model: .
mutation_rate: 0.00
mutation_bed: .
paired_ended: True
fragment_model: .
fragment_mean: 300
fragment_st_dev: 30
produce_bam: .
produce_vcf: True
produce_fastq: True
no_coverage_bias: .
rng_seed: 1
min_mutations: .
overwrite_output: .
* The run was stopped before covering dataset, it did not fail here
* The same behavior is seen when running to completion
* No error message is provided, but the insertions are not simulated
Expected behavior
We're expecting to be able to see the inserted variants (without additional mutations) in the simulated reads. For example, a string of 10 A's might be identifiable in some reads. We also expect to see the insertions from our provided VCF appear in the golden VCF.
Additional context
* An attempt was made to hardcode the input_VCF path into options.py which resulted in a slew of other errors that can be addressed if needed.
* This was done in hopes of forcing the program to read in the VCF
* Based on the code in options.py it seems that the path for self.include_vcf remains None rather than the path to the vcf file which might explain why the log file never displays that the vcf file was being read.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/ncsa/NEAT/issues/117__;!!DZ3fjg!7S9HBwolCq5iz87tmoM7Fp7qKCFmGANzgXm5w4NTLlxjY9zImpjBekgzGHbpat8W9bBCItb4pD47H-wW9Huqk4rKRusvwQ$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AGMI723YIKNOEV33KPFUQG3ZJQ3NHAVCNFSM6AAAAABKAIGQQGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TQNJYGY3DCNI__;!!DZ3fjg!7S9HBwolCq5iz87tmoM7Fp7qKCFmGANzgXm5w4NTLlxjY9zImpjBekgzGHbpat8W9bBCItb4pD47H-wW9Huqk4pisfhgSQ$>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Please check the latest release to ensure your issues are resolved. You can reopen this ticket if they persist. |
I think the issue might be in your |
you're right. I must have changed that at some point and forgot to update the configuration file. I will do that! |
It was incorrect in one version of the configuration, and correct in the other. |
@alyssa-ab please try changing the variable in the config to "include_vcf" and see if that works. I'll update the template file. |
Thanks for your help! We will give this as well as the new version a try and let you know how it goes. |
I made the change in the yml file as mentioned. The vcf is being read, but I still get the error message below. The repeats_testing.vcf file contains 1 large insertion.
I tried having 2 large insertions in the vcf file, but supposedly only 1 of the 2 variants is being detected. Additionally, it failed again with the same error message as above.
|
Okay, that looks like a bug with how it is storing the metadata. I will work on this. Did the original vcf have a ref and alt? It looks to me like the code didn't detect the ref properly. Would it be possible to share the insertions so I can try them directly? Or at least like an example line with dummy data. I'm wondering if there's a file format reason for this I didn't take into account. |
reference fasta: https://raw.githubusercontent.com/SnehaGummadi/NEAT-chimeric/4.2_dev_chimeric_reads/reference_files/chr18_smallest.fa I do want to note that when the REF allele was incorrect for the second insertions, the program threw an error. This vcf file should have the fixed version. |
All right, so the bug I found was in how it was counting how many variants it found. Once I fixed that and cleared the "reference Mismatch" one, it read two properly. I'm not sure, however, that this variant will work right with NEAT as is. I didn't really consider variants that were longer than a read length. It may get inserted at least in part in reads that overlap it's start position, but the rest of it will probably not appear anywhere. But this is on our list of things to work on for future development. |
I will push the messaging changes and hopefully get a new PR in the next day or two. |
Alright, thank you! |
Describe the bug
When trying to simulate desired insertions from VCF file, the insertions are not present in the resulting reads. Golden VCF file is completely empty after successful run (no mutations are added, included the directed insertions). Work being done with @SnehaGummadi
To Reproduce
Using version 4.2.1 main branch
Running on Linux HPC using provided conda environment from install documentation.
Command:
neat --log-level DEBUG read-simulator -c testing.yml -o asdf
debug log 1719503380.1076086_NEAT 1.log
Log never indicates success in "Reading input_VCF: path to vcf"
config yaml (full paths were provided for reference and input_variants but have been shortened for privacy)
Expected behavior
We're expecting to be able to see the inserted variants (without additional mutations) in the simulated reads. For example, a string of 10 A's might be identifiable in some reads. We also expect to see the insertions from our provided VCF appear in the golden VCF.
Additional context
options.py
which resulted in a slew of other errors that can be addressed if needed.options.py
it seems that the path forself.include_vcf
remainsNone
rather than the path to the vcf file which might explain why the log file never displays that the vcf file was being read.The text was updated successfully, but these errors were encountered: