Branch-less alphanumeric underscore's replacement #1294

franz1981 · 2025-01-21T07:04:44Z

This is following the same idea of quarkusio/quarkus#45546 re using lookup tables.
If the src Strings are rarely latin only beyond US_ASCII i.e. [128, 255] this can be further simplified by having a 128-sized lookup table.

I didn't yet looked at its performance - and, as suggested at mkouba/qute-benchmarks#1 we should have a proper micro-benchmark which account to stress the branch-predictor too (or not - should be configurable) - unless the non-alphanumeric case is supposed to be very predictable in reality as well

franz1981 · 2025-01-21T08:33:07Z

I probably have a further suggestion here

franz1981 · 2025-01-21T10:49:08Z

I've created this benchmark at which gives these numbers franz1981/java-puzzles@8fb833c

with highly predictable sameples:

Benchmark                                                (finalDoubleQuoteProbability)  (nonAlphanumericProbability)  (samples)  (size)  Mode  Cnt   Score   Error  Units
ReplaceWithUnderscores.replaceWithUnderscoresSwitch                                 10                            10        100      33  avgt   10  94.525 ± 3.802  ns/op
ReplaceWithUnderscores.replaceWithUnderscoresTableArray                             10                            10        100      33  avgt   10  20.951 ± 0.339  ns/op
ReplaceWithUnderscores.replaceWithUnderscoresTableSb                                10                            10        100      33  avgt   10  77.077 ± 1.501  ns/op

and with not that much predictable (10K samples):

Benchmark                                                     (finalDoubleQuoteProbability)  (nonAlphanumericProbability)  (samples)  (size)  Mode  Cnt    Score   Error  Units
ReplaceWithUnderscores.replaceWithUnderscoresSwitch                                      10                            10      10000      33  avgt   10  231.371 ± 9.276  ns/op
ReplaceWithUnderscores.replaceWithUnderscoresTableArray                                  10                            10      10000      33  avgt   10   32.910 ± 2.189  ns/op
ReplaceWithUnderscores.replaceWithUnderscoresTableSb                                     10                            10      10000      33  avgt   10  102.949 ± 4.079  ns/op

In both case the Table version (same of this PR) is faster with the TableSb (using the builder) to be slower .

I didn't accounted yet for the code duplication, but since native image is not wow to handle StringBuilders and Strings - moving to a "ligher" abstractions should pay off (@galderz can verify it :P) - for JIT I would like to test when only C1 is running or with "cold" code as well.

franz1981 · 2025-01-22T07:49:35Z

Sadly if I got right what C1MaxInlineSize does - C1 doesn't have inlining superpowers, hence reusing code could be relatively costly if the callee is not trivially inlinable, which seems this case

Branch-less alphanumeric underscore's replacement

4d44a72

franz1981 force-pushed the branchless_replacement branch from 29b76da to 4d44a72 Compare January 21, 2025 07:06

Adding raw byte[] variants

96c14bd

franz1981 force-pushed the branchless_replacement branch from 0c78604 to 96c14bd Compare January 21, 2025 10:36

Refactoring to reuse some code for the byte[] case

9a0a872

franz1981 force-pushed the branchless_replacement branch from 7fc70d3 to 9a0a872 Compare January 21, 2025 20:01

Duplicate code due to C1MaxInlineSize

08d783d

franz1981 mentioned this pull request Jan 23, 2025

Makes EnvName mutable to avoid allocating it for lookups #1295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch-less alphanumeric underscore's replacement #1294

Branch-less alphanumeric underscore's replacement #1294

franz1981 commented Jan 21, 2025 •

edited

Loading

franz1981 commented Jan 21, 2025 •

edited

Loading

franz1981 commented Jan 21, 2025 •

edited

Loading

franz1981 commented Jan 22, 2025

Branch-less alphanumeric underscore's replacement #1294

Are you sure you want to change the base?

Branch-less alphanumeric underscore's replacement #1294

Conversation

franz1981 commented Jan 21, 2025 • edited Loading

franz1981 commented Jan 21, 2025 • edited Loading

franz1981 commented Jan 21, 2025 • edited Loading

franz1981 commented Jan 22, 2025

franz1981 commented Jan 21, 2025 •

edited

Loading

franz1981 commented Jan 21, 2025 •

edited

Loading

franz1981 commented Jan 21, 2025 •

edited

Loading