-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking #46
Comments
Other than measuring throughput on the different htsget-rs layers, we should keep a close eye on this common htsget problem and metric: googlegenomics/htsget#7 |
Discussing this over GA4GH's Slack... I’d argue that the best course of action is to create a benches subdirectory for each htsget-rs crate instead of creating a dedicated “htsget-benchmarks” crate within the workspace (as the I’d love the benchmarks for the server to be as end-to-end to measure real world throughput. /cc @victorskl thoughts? ;) |
Yah, benchmarks for the server sound good. On top of my head
And by then by doing so, you could come up with more methodology around it, at that point... This need to do for both referencing impl and Rust impl here. You can start with local for this arrangement; and then by reaching at the point; it will be clearer what come next, I reckon. Please also to make sure implementation "correctness", first i.e. server responses are correct according to the Spec and/or same with the referencing impl. If anything differ, need to note it. |
Thanks a ton Victor for the thoughts and guidelines! There's another bit I just recalled and that is the amount of MB's this implementation returns for a given query. I'm thinking and referring to this concerning issue with htsget: googlegenomics/htsget#7 |
Well, I still need to tinker a bit to see which problems I will find, but Victor's guidelines seem to make a lot of sense for now. |
I'm having problems with the reference implementation 😭 After following the instruction in the README for building and running the server, I'm getting this error:
I didn't find anything like this in the issues, should I add a new one? The project seems somewhat abandoned, with the most recent issue/PR being from 2019 😥 |
Oh, please don't try to deploy the googlegenomics htsget, it's indeed quite Google-tailored. I pointed at that particular issue because I wanted to know how many extra reads the GA4GH reference implementation returned, not Google's one, sorry for not being clear there :-S |
Seems like I got some things mixed up. Where can I find the reference implementation? |
Sorry, when I say reference implementation, I usually refer to GA4GH's one here: https://github.com/ga4gh/htsget-refserver ... perhaps sooner our later ours (htsget-rs) will be the GA4GH "reference implementation", we'll see after the benchmarking ;) |
That would be awesome! 💯 |
I think I have hit a blocker :(. I've been using reqwest to perform the client requests in the benchmarks. Until now, I had only tried to get the initial response from the servers, but now I tried to also download the files with the urls the servers provided and I found a problem with our implementation. The urls it gives use the "file://" scheme, which isn't allowed by reqwest. This was something that already needed to be changed, because the htsget spec doesn't allow it, but we might have to deal with it sooner than expected. |
Yes, I reckon that this should be fixed, it's good that we find those as we go, we still have time to iron those things before the end GSoC :) |
@andrewpatto You were right, only the |
As mentioned on the original GSoC proposals, this project is not complete until we validate that is well performant :)
For this, I've been cooking this repo: https://github.com/umccr/aws-benchmarks
As an example, using AWS S3 client libraries as subjects to test both CPU and I/O throughput on... but the idea is to apply those benches over here.
This task depends on #45 being finished.
The text was updated successfully, but these errors were encountered: