Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line limit of bedtools getfasta and other sub-commands? #1107

Open
YuanfengZhang opened this issue Nov 11, 2024 · 2 comments
Open

Line limit of bedtools getfasta and other sub-commands? #1107

YuanfengZhang opened this issue Nov 11, 2024 · 2 comments

Comments

@YuanfengZhang
Copy link

When passing a sorted bed file containing more than 20 billion lines of genomic regions to getfasta, I only got the results of chr1, chr10 and chr11. The rows behind them were ignored. By the way, I'm sure the RAM is abundant (1TB 3200MHz).

I searched the closed issues and bedtools documentation but didn't get a proper answer. I know I can avoid this by separating single file to multiple small files, but I'm just curious that does bedtools have a line limit which is not described in the doc, or it's more likely to be a buffer/tmp file issue on my Ubuntu server?

Thanks for your help in advance.

@ghuls
Copy link
Contributor

ghuls commented Jan 16, 2025

This is likely because your file is too big to be handled by BEDTools as it currently uses int32 instead of int64.

    char *seq = fai_fetch(index->faidx, seqname.c_str(), &len); // TODO: update to fai_fetch64 when htslib is updated.

If/When BEDTools updates htslib to a recent version 64bit versions of e.g. fai_fetch64 should be available and solve your problem.

@YuanfengZhang
Copy link
Author

Thank you so much👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants