-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with indexing concatenated gtf file #98
Comments
Thanks for the detailed report, Will! We'll look into this ASAP (cc @DongzeHE). Off the top of my head, there is no obvious reason it shouldn't work with properly formatted fasta and GTF files. As a start to the investigation, I tried to process it with
So it seems there are several things Best, |
Ok, small update. With the
So, somehow there are several issues with the input (concatenated) GTF that |
Ok @wmacnair, Progress. This is kind of hilarious ;). The problem is with the EBV GTF file. In addition to the warnings that you see above (which are structural but maybe not serious?) the major problem is that the GTF file is ill-formed. Each line ends with a
and then concatenated as above. Let me know if this resolves things for you. Best, |
Hey, thanks for looking through all of this in time for my morning :) So it's just a dumb formatting bug in the EBV gtf file, with an extra space or tab at the end that shouldn't be there? A nice easy fix, at least! With this fix, it ran fine 🎉
Thanks so much! |
Hello :)
I am trying to map to a combined human and EBV genome, and I'm running into issues... Hopefully this problem is both (a) kind of within scope of what you do / what I can reasonably ask here, and (b) in the appropriate repo. Please tell me if either is off!
What I have done:
I first concatenated them naively
then ran
simpleaf index
:but got errors:
The gtf files have different attribute lists, so I then tried removing everything except gene_id:
and, yay, new error message!
I then tried including just gene_id and transcript_id, and I think I got this to work. However, transcript_id is completely missing for some entries in the human gtf, so I got the same error message.
Things I'm not sure about:
simpleaf
/alevin-fry
/roers
Any thoughts? I've included the ebv.gtf and the first 1k lines of the human.gtf in the zip file attached.
Please do gently suggest I go somewhere else if this is out of scope :)
Cheers
Will
human_ebv_gtf_files.zip
The text was updated successfully, but these errors were encountered: