Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a regular expression benchmark #224

Merged
merged 2 commits into from
Mar 8, 2023

Conversation

abrown
Copy link
Collaborator

@abrown abrown commented Mar 8, 2023

This regular expression benchmark measures the time taken to match e-mails, URIs, and IP addresses in an input text. It is written in Rust and significantly adapted from the regex-benchmark project. The input text is a bit heavy (6.6MB) and the benchmark.wasm is as well (2.8MB); it may be worthwhile at some point to run some of these benchmark files through wasm-opt, e.g., to reduce the size.

It appears that one iteration of this benchmark takes around half a second for all three phases and the compilation phase is about an order of magnitude larger than the execution phase. I looked into creating a smaller version of the input text, but this would not significantly change the overall benchmark time (since it is concentrated in compilation) and, importantly, I could not figure out a way to have different stdout and stderr expectations for different workloads. With a smaller workload, the output will be different and currently there is no way to express that.

Closes #216.

This regular expression benchmark measures the time taken to match
e-mails, URIs, and IP addresses in an input text. It is written in Rust
and significantly adapted from the [regex-benchmark] project. The input
text is a bit heavy (6.6MB) and the `benchmark.wasm` is as well (2.8MB);
it may be worthwhile at some point to run some of these benchmark files
through `wasm-opt`, e.g., to reduce the size.

It appears that one iteration of this benchmark takes around half a
second for all three phases and the compilation phase is about an order
of magnitude larger than the execution phase. I looked into creating a
smaller version of the input text, but this would not significantly
change the overall benchmark time (since it is concentrated in
compilation) and, importantly, I could not figure out a way to have
different `stdout` and `stderr` expectations for different workloads.
With a smaller workload, the output _will_ be different and currently
there is no way to express that.

[regex-benchmark]: https://github.com/mariomka/regex-benchmark
@abrown
Copy link
Collaborator Author

abrown commented Mar 8, 2023

If people are interested, this workload might be one to add to the default.suite--it seems like interesting code to benchmark by default.

@abrown
Copy link
Collaborator Author

abrown commented Mar 8, 2023

cc: @fitzgen, @alexcrichton, @cfallin, @jameysharp, @elliottt, @jlb6740. Can someone take a look and +1?

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fitzgen fitzgen merged commit f77897d into bytecodealliance:main Mar 8, 2023
@abrown abrown deleted the regex-benchmark branch March 8, 2023 22:51
@jameysharp
Copy link
Contributor

Awesome, I've wanted a regex benchmark to study in Sightglass. Thanks!

I've opened #226 with a bunch of experiments which somebody could try that should affect compile time, run time, or both, and in some cases may hit very different code generation paths in Cranelift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a regex benchmark
3 participants