Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getfasta index corruption checks for .fai files #228

Closed
dshivak opened this issue May 4, 2015 · 3 comments
Closed

getfasta index corruption checks for .fai files #228

dshivak opened this issue May 4, 2015 · 3 comments

Comments

@dshivak
Copy link

dshivak commented May 4, 2015

If the underlying fasta file is changed and the .fai index file is left in place, getfasta will retrieve wrong results. I had a fasta file replaced by a co-worker (who shall remain nameless). This led to getfasta generating incorrect sequence due to differing sequence offsets (headers differed, sequence was the same). This is obviously an edge case but has the potential to corrupt all sequence from that point on.

Perhaps at the minimum an mtime comparison could be done to see if the fasta file is newer than it's index file? Anything more would involve including sanity check offsets in the .fai file which is a samtools spec issue. Checking for illegal characters in the output is unlikely to work since the query coordinates would have to be near a header line to output illegal sequence characters.

To reproduce the error case (tested with 2.23.0):

host:~/Downloads/bedtools2/test/getfasta$ cat t.fa
>chr1
cggggggggg
>chr2
AAATTTTTTTTTT
host:~/Downloads/bedtools2/test/getfasta$ echo -e "chr2\t2\t10" | ../../bin/bedtools getfasta -fi t.fa -bed - -fo -
index file t.fa.fai not found, generating...
>chr2:2-10
ATTTTTTT
# Change header in source, leave .fai in place
host:~/Downloads/bedtools2/test/getfasta$ cat t.fa
>chr1 ver 2.2.2.2.2
cggggggggg
>chr2
AAATTTTTTTTTT
host:~/Downloads/bedtools2/test/getfasta$ echo -e "chr2\t2\t10" | ../../bin/bedtools getfasta -fi t.fa -bed - -fo -
>chr2:2-10
ggggg>c
@gtamazian
Copy link
Contributor

Added a warning message indicating that an index file is older than the corresponding FASTA file; PR #238.

@arq5x
Copy link
Owner

arq5x commented May 25, 2015

Could you possibly create a pull request with just the getfasta changes? I would like to look more closely at the changes to the other files before merging, but this seems clean and a nice improvement!

@gtamazian
Copy link
Contributor

@arq5x I created a separate pull request with getfasta fixes, please check PR #242.

@arq5x arq5x closed this as completed in 0b73675 May 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants