Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks #59

Merged
merged 62 commits into from
May 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
a75351b
POC for a simple benchmark
CastilloDel Aug 1, 2021
235029f
Add config for the htsget-refserver
CastilloDel Aug 3, 2021
9fcb2ea
Add a little script to start the htsget-refserver
CastilloDel Aug 3, 2021
02075ea
Add a simple benchmark
CastilloDel Aug 4, 2021
650bbfd
Comparison between total download size
CastilloDel Aug 5, 2021
8adfd7a
Change url types
CastilloDel Aug 6, 2021
cbf58e0
Add static files to the server
CastilloDel Aug 6, 2021
4f28931
Show the benchmark download size properly
CastilloDel Aug 7, 2021
02a6418
Don't force to use '/data' as an extension for htsget-search urls
CastilloDel Aug 8, 2021
203abf9
Add another benchmark
CastilloDel Aug 8, 2021
f9f8b30
Refactor and adjustments
CastilloDel Aug 9, 2021
87ff1b3
Add a more complex benchmark
CastilloDel Aug 9, 2021
7c4924d
Add htsget-search benchmark
CastilloDel Aug 9, 2021
5e7bf21
Add more htsget-search tests
CastilloDel Aug 10, 2021
46baaac
Add VCF test
CastilloDel Aug 12, 2021
4d39271
Add a test with a big file
CastilloDel Aug 12, 2021
5b1f429
Add GitHub Action
CastilloDel Aug 12, 2021
d6d85b5
Use constants for the benchmark configuration
CastilloDel Aug 12, 2021
9874e0a
Fix CI
CastilloDel Aug 13, 2021
4674ddc
Fix CI
CastilloDel Aug 13, 2021
13f609d
Give exec bit to shell scripts, add html-reports to criterion to supr…
brainstorm Aug 16, 2021
8fb4a85
Try latest version of criterion-compare
brainstorm Aug 16, 2021
2875e2a
Add instruction on how to run the benchmarks in the README
CastilloDel Aug 16, 2021
7d0da7f
Fix typo
CastilloDel Aug 21, 2021
378e638
Fix dead code warnings for async/blocking versions
brainstorm Sep 17, 2021
0fbbefc
Merge branch 'main' into benchmarks
brainstorm Sep 17, 2021
27ff9ee
Fix bad first pass merge
brainstorm Sep 17, 2021
e98d828
Merge branch 'dead_code_warning_async_blocking' into benchmarks
brainstorm Sep 17, 2021
70652bc
Unused actix files, must check how @CastilloDel was approaching this …
brainstorm Sep 17, 2021
d4cdf58
Only get/post requests (with_range) failing on benchmarks branch as a…
brainstorm Sep 17, 2021
46ea97f
Further separating the blocking/async versions via #cfg directives, n…
brainstorm Sep 17, 2021
e0d0ab6
fmt that
brainstorm Sep 17, 2021
8fe539d
Arc only for async
brainstorm Sep 23, 2021
698aae2
Test both default --all-features and --no--default-features
brainstorm Sep 23, 2021
dfb26db
Argh, this should be in args, not command... also add cargo cache fro…
brainstorm Sep 23, 2021
8824d82
The blocking side has to have the handlers on 'pub async fn' for acti…
brainstorm Sep 27, 2021
72b8ed5
Merge branch 'dead_code_warning_async_blocking' into benchmarks
brainstorm Sep 27, 2021
998026e
Add missing actix_files::Files
brainstorm Sep 27, 2021
2af6bde
Fix several errors related to the static file server
CastilloDel Sep 28, 2021
838f78e
Fix benchmark so it can run as blocking
CastilloDel Sep 28, 2021
86be4dc
Merge branch 'main' of https://github.com/umccr/htsget-rs into benchm…
mmalenic May 22, 2022
9b40653
Merge branch 'main' of https://github.com/umccr/htsget-rs into benchm…
mmalenic May 24, 2022
50d78de
Refactor benchmarks, fix a few errors, rearrange some tests.
mmalenic May 24, 2022
5eca71a
Merge branch 'main' into benchmarks
brainstorm May 24, 2022
e7217e0
Refactor rcgen in axum server.
mmalenic May 25, 2022
65dbfd6
Merge branch 'main' of https://github.com/umccr/htsget-rs into benchm…
mmalenic May 25, 2022
8f89303
Merge branch 'benchmarks' of https://github.com/umccr/htsget-rs into …
mmalenic May 25, 2022
85a2d95
Implement script files into rust code.
mmalenic May 25, 2022
aaea724
Merge branch 'benchmarks' of https://github.com/umccr/htsget-rs into …
mmalenic May 25, 2022
4fdf781
Merge branch 'main' of https://github.com/umccr/htsget-rs into benchm…
mmalenic May 25, 2022
c89fac3
Fix localstorage path (#86)
mmalenic May 27, 2022
f44c52a
Merge branch 'benchmarks' of https://github.com/umccr/htsget-rs into …
mmalenic May 27, 2022
b252096
Fix certificate errors by using rustls-tls.
mmalenic May 27, 2022
25e364d
Http byte ranges are inclusive for ending range, fix this and all aff…
mmalenic May 30, 2022
020a9ad
Add light and heavy benchmarks, update README.md.
mmalenic May 30, 2022
a528f65
Update actions, remove files, update README.md.
mmalenic May 30, 2022
97d1875
Update action
mmalenic May 30, 2022
2a77de4
Resolve default/s3-storage feature clashes.
mmalenic May 31, 2022
da79a0f
Update action.yml.
mmalenic May 31, 2022
28e9347
Update action.yml.
mmalenic May 31, 2022
253f97f
Update action
mmalenic May 31, 2022
b13c410
Reduce number of samples.
mmalenic May 31, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 18 additions & 11 deletions .github/workflows/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@ name: tests

on: [push]

env:
AWS_ACCESS_KEY_ID: "FOO"
AWS_SECRET_ACCESS_KEY: "BAR"

jobs:
test:
runs-on: ${{ matrix.os }}
Expand All @@ -14,6 +10,7 @@ jobs:
rust: [stable]
os: [ubuntu-latest]
steps:
# Run tests
- uses: actions/checkout@v2
- name: Install Rust
uses: actions-rs/toolchain@v1
Expand All @@ -22,12 +19,6 @@ jobs:
override: true
components: rustfmt, clippy

- name: Install cargo-lambda
uses: actions-rs/cargo@v1
with:
command: install
args: cargo-lambda

- uses: Swatinem/rust-cache@v1
- name: Run cargo fmt
uses: actions-rs/cargo@v1
Expand Down Expand Up @@ -62,4 +53,20 @@ jobs:
uses: actions-rs/cargo@v1
with:
command: test
args: --all-targets --all-features
args: --all-features

# Run benchmarks
- name: Run benchmarks
run: cargo bench --bench search-benchmarks --bench request-benchmarks -- LIGHT --output-format bencher | tee output.txt
- name: Download previous benchmark data
uses: actions/cache@v1
with:
path: ./cache
key: ${{ runner.os }}-benchmark
- name: Store benchmark result
uses: benchmark-action/github-action-benchmark@v1
with:
tool: 'cargo'
output-file-path: output.txt
external-data-json-path: ./cache/benchmark-data.json
fail-on-alert: false
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
target
Cargo.lock
*.code-workspace
*.vcf.gz
.vscode
.idea
package-lock.json
deploy/.build
deploy/.build
2 changes: 1 addition & 1 deletion data/events/event_get.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"httpMethod": "GET",
"path": "/variants/data/vcf/sample1-bcbio-cancer",
"path": "/variants/vcf/sample1-bcbio-cancer",
"body": null,

"resource": "/{proxy+}",
Expand Down
2 changes: 1 addition & 1 deletion data/events/event_parameterized_get.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"httpMethod": "GET",
"path": "/variants/data/vcf/sample1-bcbio-cancer",
"path": "/variants/vcf/sample1-bcbio-cancer",
"body": null,

"resource": "/{proxy+}",
Expand Down
2 changes: 1 addition & 1 deletion data/events/event_parameterized_post.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"httpMethod": "POST",
"path": "/variants/data/vcf/sample1-bcbio-cancer",
"path": "/variants/vcf/sample1-bcbio-cancer",
"body": "{\"format\": \"VCF\", \"regions\": [{\"referenceName\": \"chrM\"}]}",

"resource": "/{proxy+}",
Expand Down
2 changes: 1 addition & 1 deletion data/events/event_parameterized_post_class_header.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"httpMethod": "POST",
"path": "/variants/data/vcf/sample1-bcbio-cancer",
"path": "/variants/vcf/sample1-bcbio-cancer",
"body": "{\"format\": \"VCF\", \"class\": \"header\", \"regions\": [{\"referenceName\": \"chrM\"}]}",

"resource": "/{proxy+}",
Expand Down
2 changes: 1 addition & 1 deletion data/events/event_post.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"httpMethod": "POST",
"path": "/variants/data/vcf/sample1-bcbio-cancer",
"path": "/variants/vcf/sample1-bcbio-cancer",
"body": null,

"resource": "/{proxy+}",
Expand Down
Binary file added data/vcf/internationalgenomesample.vcf.gz.tbi
Binary file not shown.
6 changes: 3 additions & 3 deletions htsget-config/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ use crate::config::StorageType::LocalStorage;
use crate::regex_resolver::RegexResolver;

pub const USAGE: &str = r#"
This executable doesn't use command line arguments, but there are some environment variables that can be set to configure the HtsGet server:
* HTSGET_ADDR: The socket address to use for the server which creates response tickets. Default: "127.0.0.1:8080".
The HtsGet server executables don't use command line arguments, but there are some environment variables that can be set to configure them:
* HTSGET_ADDR: The socket address for the server which creates response tickets. Default: "127.0.0.1:8080".
* HTSGET_PATH: The path to the directory where the server should be started. Default: ".". Unused if HTSGET_STORAGE_TYPE is "AwsS3Storage".
* HTSGET_REGEX: The regular expression that should match an ID. Default: ".*".
For more information about the regex options look in the documentation of the regex crate(https://docs.rs/regex/).
* HTSGET_SUBSTITUTION_STRING: The replacement expression. Default: "$0".
* HTSGET_STORAGE_TYPE: Either LocalStorage or AwsS3Storage. Default: "LocalStorage".
* HTSGET_STORAGE_TYPE: Either "LocalStorage" or "AwsS3Storage", representing which storage type to use. Default: "LocalStorage".

The following options are used for the ticket server.
* HTSGET_TICKET_SERVER_ADDR: The socket address to use for the server which responds to tickets. Default: "127.0.0.1:8081". Unused if HTSGET_STORAGE_TYPE is not "LocalStorage".
Expand Down
2 changes: 1 addition & 1 deletion htsget-devtools/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ edition = "2021"
serde = { version = "~1.0", features = ["derive"] }
serde_yaml = "~0.8"

noodles = { version = "0.18.0", features = ["bam", "bcf", "bgzf", "cram", "csi", "sam", "tabix", "vcf"] }
noodles = { version = "0.18.0", features = ["bam", "bcf", "bgzf", "cram", "csi", "sam", "tabix", "vcf"] }
14 changes: 12 additions & 2 deletions htsget-http-actix/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,17 @@ futures = { version = "0.3" }
tokio = { version = "1.17", features = ["full"] }
tracing-actix-web = "0.5"
tracing = "0.1"
tracing-subscriber = "0.3"

[dev-dependencies]
htsget-test-utils = { path = "../htsget-test-utils", default-features = false }
async-trait = "0.1"
htsget-test-utils = { path = "../htsget-test-utils", features = ["server-tests"], default-features = false }
async-trait = "0.1"

criterion = { version = "0.3", features = ["async_tokio"] }
reqwest = { version = "0.11", features = ["json", "blocking", "rustls-tls"] }
tempfile = "3.3"

[[bench]]
name = "request-benchmarks"
harness = false
path = "benches/request_benchmarks.rs"
84 changes: 56 additions & 28 deletions htsget-http-actix/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,45 +3,53 @@ This crate should allow to setup an [htsget](http://samtools.github.io/hts-specs

## Quickstart

These are some examples with [curl](https://github.com/curl/curl). **For the curl examples shown below to work, we assume that the server is being started from the root of the [htsget-rs project](https://github.com/umccr/htsget-rs)**, so we can use the example files inside the `data` directory.
These are some examples with [curl](https://github.com/curl/curl). **For the curl examples shown
below to work, we assume that the server is being started from the root of
the [htsget-rs project](https://github.com/umccr/htsget-rs)**, and `HTSGET_PATH="data/"`.

To test them you can run:
The htsget-http-actix server also requires pem formatted X.509 certificates to access the response tickets.

For example, to generate self-signed certificates, run:

```shell
$ cargo run -p htsget-http-actix
$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj '/CN=localhost'
```

From **the top of the project**. Alternatively, the `HTSGET_PATH` environment variable can be set accordingly if the current working directory is `htsget-http-actix`, i.e:
To test the curl example below, run:

```shell
$ HTSGET_PATH=../ cargo run
$ cargo run -p htsget-http-actix
```

Otherwise we could have problems as [directory traversal](https://en.wikipedia.org/wiki/Directory_traversal_attack) isn't allowed.


## Environment variables

There are reasonable defaults to allow the user to spin up the server as fast as possible, but all of them are configurable via environment variables.

Since this service can be used in serverless environments, no `dotenv` configuration is needed, [adjusting the environment variables below prevent accidental leakage of settings and sensitive information](https://medium.com/@softprops/configuration-envy-a09584386705).

| Variable | Description | Default |
|---|---|---|
| HTSGET_IP| IP address | 127.0.0.1 |
| HTSGET_PORT| TCP Port | 8080 |
| HTSGET_PATH| The path to the directory where the server starts | `$PWD` |
| HTSGET_REGEX| The regular expression an ID should match. | ".*" |
| HTSGET_REPLACEMENT| The replacement expression, to produce a key from an ID. | "$0" |
| HTSGET_ID| ID of the service. | "" |
| HTSGET_NAME| Name of the service. | HtsGet service |
| HTSGET_VERSION | Version of the service | ""
| HTSGET_ORGANIZATION_NAME| Name of the organization | Snake oil
| HTSGET_ORGANIZATION_URL| URL of the organization | https://en.wikipedia.org/wiki/Snake_oil |
| HTSGET_CONTACT_URL | URL to provide contact to the users | "" |
| HTSGET_DOCUMENTATION_URL| Link to documentation | https://github.com/umccr/htsget-rs/tree/main/htsget-http-actix |
| HTSGET_CREATED_AT | Date of the creation of the service. | "" |
| HTSGET_UPDATED_AT | Date of the last update of the service. | "" |
| HTSGET_ENVIRONMENT | Environment in which the service is running. | Testing |
| Variable | Description | Default |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------------|
| HTSGET_ADDR | The socket address for the server which creates response tickets. | "127.0.0.1:8080" |
| HTSGET_PATH | The path to the directory where the server starts | "." |
| HTSGET_REGEX | The regular expression an ID should match. | ".*" |
| HTSGET_SUBSTITUTION_STRING | The replacement expression, to produce a key from an ID. | "$0" |
| HTSGET_STORAGE_TYPE | Either "LocalStorage" or "AwsS3Storage", representing which storage type to use. | "LocalStorage" |
| HTSGET_TICKET_SERVER_ADDR | The socket address to use for the server which responds to tickets. Unused if HTSGET_STORAGE_TYPE is not "LocalStorage". | "127.0.0.1:8081" |
| HTSGET_TICKET_SERVER_KEY | The path to the PEM formatted X.509 private key used by the ticket response server. Unused if HTSGET_STORAGE_TYPE is not "LocalStorage". | "key.pem" |
| HTSGET_TICKET_SERVER_CERT | The path to the PEM formatted X.509 certificate used by the ticket response server. Unused if HTSGET_STORAGE_TYPE is not "LocalStorage". | "cert.pem" |
| HTSGET_S3_BUCKET | The name of the AWS S3 bucket. Unused if HTSGET_STORAGE_TYPE is not "AwsS3Storage". | "" |
| HTSGET_ID | ID of the service. | "None" |
| HTSGET_NAME | Name of the service. | "None" |
| HTSGET_VERSION | Version of the service. | "None" |
| HTSGET_ORGANIZATION_NAME | Name of the organization. | "None" |
| HTSGET_ORGANIZATION_URL | URL of the organization. | "None" |
| HTSGET_CONTACT_URL | URL to provide contact to the users. | "None" |
| HTSGET_DOCUMENTATION_URL | Link to documentation. | "None" |
| HTSGET_CREATED_AT | Date of the creation of the service. | "None" |
| HTSGET_UPDATED_AT | Date of the last update of the service. | "None" |
| HTSGET_ENVIRONMENT | Environment in which the service is running. | "None" |
For more information about the regex options look in the [documentation of the regex crate](https://docs.rs/regex/).

## Example cURL requests
Expand All @@ -51,25 +59,25 @@ As mentioned above, please keep in mind that the server will take the path where
### GET

```shell
$ curl '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
$ curl '127.0.0.1:8080/variants/vcf/sample1-bcbio-cancer'
```

### POST

```shell
$ curl --header "Content-Type: application/json" -d '{}' '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
$ curl --header "Content-Type: application/json" -d '{}' '127.0.0.1:8080/variants/vcf/sample1-bcbio-cancer'
```

### Parametrised GET

```shell
$ curl '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer?format=VCF&class=header'
$ curl '127.0.0.1:8080/variants/vcf/sample1-bcbio-cancer?format=VCF&class=header'
```

### Parametrised POST

```shell
$ curl --header "Content-Type: application/json" -d '{"format": "VCF", "regions": [{"referenceName": "chrM"}]}' '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
$ curl --header "Content-Type: application/json" -d '{"format": "VCF", "regions": [{"referenceName": "chrM"}]}' '127.0.0.1:8080/variants/vcf/sample1-bcbio-cancer'
```

### Service-info
Expand All @@ -78,9 +86,29 @@ $ curl --header "Content-Type: application/json" -d '{"format": "VCF", "regions"
$ curl 127.0.0.1:8080/variants/service-info
```

## Running the benchmarks
There are benchmarks for the htsget-search crate and for the htsget-http-actix crate. The first ones work like normal benchmarks, but the latter ones try to compare the performance of this implementation and the [reference implementation](https://github.com/ga4gh/htsget-refserver).
There are a set of light benchmarks, and one heavy benchmark. Light benchmarks can be performed by executing:

```
cargo bench -p htsget-http-actix -- LIGHT
```

In order to run the heavy benchmark, an additional vcf file should be downloaded, and placed in the `data/vcf` directory:

```
curl ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr14.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz > data/vcf/internationalgenomesample.vcf.gz
```

Then to run the heavy benchmark:

```
cargo bench -p htsget-http-actix -- HEAVY
```

## Example Regular expressions
In this example 'data/' is added after the first '/'.
```shell
$ HTSGET_REGEX='(?P<group1>.*?)/(?P<group2>.*)' HTSGET_REPLACEMENT='$group1/data/$group2' cargo run --release -p htsget-http-actix
$ HTSGET_REGEX='(?P<group1>.*?)/(?P<group2>.*)' HTSGET_SUBSTITUTION_STRING='$group1/data/$group2' cargo run --release -p htsget-http-actix
```
For more information about the regex options look in the [documentation of the regex crate](https://docs.rs/regex/).
28 changes: 28 additions & 0 deletions htsget-http-actix/benches/htsget-refserver-config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"htsgetConfig": {
"props": {
"port": 8082,
"host": "http://localhost:8082/"
},
"reads": {
"dataSourceRegistry": {
"sources": [
{
"pattern": "^(?P<id>.*)$",
"path": "/data/bam/{id}.bam"
}
]
}
},
"variants": {
"dataSourceRegistry": {
"sources": [
{
"pattern": "^(?P<id>.*)$",
"path": "/data/vcf/{id}.vcf.gz"
}
]
}
}
}
}
Loading