Add a regular expression benchmark #224

abrown · 2023-03-08T18:29:49Z

This regular expression benchmark measures the time taken to match e-mails, URIs, and IP addresses in an input text. It is written in Rust and significantly adapted from the regex-benchmark project. The input text is a bit heavy (6.6MB) and the benchmark.wasm is as well (2.8MB); it may be worthwhile at some point to run some of these benchmark files through wasm-opt, e.g., to reduce the size.

It appears that one iteration of this benchmark takes around half a second for all three phases and the compilation phase is about an order of magnitude larger than the execution phase. I looked into creating a smaller version of the input text, but this would not significantly change the overall benchmark time (since it is concentrated in compilation) and, importantly, I could not figure out a way to have different stdout and stderr expectations for different workloads. With a smaller workload, the output will be different and currently there is no way to express that.

Closes #216.

This regular expression benchmark measures the time taken to match e-mails, URIs, and IP addresses in an input text. It is written in Rust and significantly adapted from the [regex-benchmark] project. The input text is a bit heavy (6.6MB) and the `benchmark.wasm` is as well (2.8MB); it may be worthwhile at some point to run some of these benchmark files through `wasm-opt`, e.g., to reduce the size. It appears that one iteration of this benchmark takes around half a second for all three phases and the compilation phase is about an order of magnitude larger than the execution phase. I looked into creating a smaller version of the input text, but this would not significantly change the overall benchmark time (since it is concentrated in compilation) and, importantly, I could not figure out a way to have different `stdout` and `stderr` expectations for different workloads. With a smaller workload, the output _will_ be different and currently there is no way to express that. [regex-benchmark]: https://github.com/mariomka/regex-benchmark

abrown · 2023-03-08T18:32:43Z

If people are interested, this workload might be one to add to the default.suite--it seems like interesting code to benchmark by default.

abrown · 2023-03-08T22:25:14Z

cc: @fitzgen, @alexcrichton, @cfallin, @jameysharp, @elliottt, @jlb6740. Can someone take a look and +1?

fitzgen

LGTM!

jameysharp · 2023-03-09T22:06:44Z

Awesome, I've wanted a regex benchmark to study in Sightglass. Thanks!

I've opened #226 with a bunch of experiments which somebody could try that should affect compile time, run time, or both, and in some cases may hit very different code generation paths in Cranelift.

fix: move Rust code back to sub-directory

3a1a3c3

fitzgen approved these changes Mar 8, 2023

View reviewed changes

fitzgen merged commit f77897d into bytecodealliance:main Mar 8, 2023

abrown deleted the regex-benchmark branch March 8, 2023 22:51

jameysharp mentioned this pull request Mar 9, 2023

Benchmark more regex variations #226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a regular expression benchmark #224

Add a regular expression benchmark #224

abrown commented Mar 8, 2023 •

edited

Loading

abrown commented Mar 8, 2023

abrown commented Mar 8, 2023

fitzgen left a comment

jameysharp commented Mar 9, 2023

Add a regular expression benchmark #224

Add a regular expression benchmark #224

Conversation

abrown commented Mar 8, 2023 • edited Loading

abrown commented Mar 8, 2023

abrown commented Mar 8, 2023

fitzgen left a comment

Choose a reason for hiding this comment

jameysharp commented Mar 9, 2023

abrown commented Mar 8, 2023 •

edited

Loading