Optimise string trimming on JavaScript #736

richard-viney · 2024-11-12T16:48:44Z

Optimises the regexes used for string trimming on JavaScript be to 1.5-3x faster, depending on the input and trimming function used.

lpil

I'm surprised that regex doesn't iterate faster than a JS loop.

Is it faster due to not using regex or is it because string slicing is being used instead of string replacement?

richard-viney · 2024-11-20T12:12:33Z

On Node.js 22.11 the speedup comes from both not using a regex, and also from using string slicing instead of string replacement:

Approach	Time (N = 500 million)
`Regexp.replace()` (current approach)	19.2 s
`RegExp.exec()` then `String.substring()`	14.6 s
Manual trimming (this PR)	2.7 s

However, turns out it this also very runtime dependent. While the initial code on this PR was an order of magnitude faster on Node/Deno, it's actually slower on Bun. This seems to be because Bun handles a regex replace better than it can JIT optimise the hand-written trimming code.

I've therefore changed this PR to the following:

Remove g from the trim regexes as it's unnecessary and slows things down quite a bit on all runtimes.
Use a single unified regex for trim().
Use regex based trimming on the Bun runtime.
Use manual JS trimming code on the Node/Deno runtimes.
The exception is trim_start(), because in this case Bun runs the manual JS trimming code faster than it runs a regex.

This gives the best performance on all runtimes.

If taking different code paths based on the JS runtime is undesirable then what we could do instead is only implement points 1, 2, and potentially 5 as listed above. This would be simpler, still give a meaningful speedup on all runtimes (at least 2-3x), but does leave a fair amount of performance on the table.

lpil

I don't want to have any runtime specific checks in the stdlib, especially since we'd need to have checks for the browsers (which are the main places Gleam JS use used) so let's stick with the optimised regex for all platforms.

richard-viney · 2024-11-20T19:38:38Z

Have changed to only apply the regex-based improvements.

If in future anyone comes across this thread searching for a boost to trimming performance on JS and mainly cares about V8-based runtimes, try the following code to get trimming performance that can be many times faster:

export function trim(string) {
  const start_index = find_non_whitespace_char(string);

  let end_index = rfind_non_whitespace_char(string) + 1;
  if (end_index < start_index) {
    end_index = start_index;
  }

  return string.substring(start_index, end_index);
}

export function trim_start(string) {
  return string.substring(find_non_whitespace_char(string));
}

export function trim_end(string) {
  return string.substring(0, rfind_non_whitespace_char(string) + 1);
}

function isUnicodeWhitespace(c) {
  return (
    c === "\u0020" || // Space
    c === "\u0009" || // Horizontal tab
    c === "\u000A" || // Line feed
    c === "\u000B" || // Vertical tab
    c === "\u000C" || // Form feed
    c === "\u000D" || // Carriage return
    c === "\u0085" || // Next line
    c === "\u2028" || // Line separator
    c === "\u2029" // Paragraph separator
  );
}

function find_non_whitespace_char(string) {
  let i = 0;

  for (; i < string.length; i++) {
    if (!isUnicodeWhitespace(string[i])) {
      break;
    }
  }

  return i;
}

function rfind_non_whitespace_char(string) {
  let i = string.length - 1;

  for (; i >= 0; i--) {
    if (!isUnicodeWhitespace(string[i])) {
      break;
    }
  }

  return i;
}

lpil

Thank you!!

richard-viney force-pushed the js-optimise-string-trim branch 2 times, most recently from b510efa to 5303d38 Compare November 12, 2024 17:06

richard-viney marked this pull request as ready for review November 12, 2024 17:17

lpil reviewed Nov 15, 2024

View reviewed changes

richard-viney force-pushed the js-optimise-string-trim branch 2 times, most recently from 2952fc8 to da7f50e Compare November 20, 2024 11:50

lpil reviewed Nov 20, 2024

View reviewed changes

richard-viney force-pushed the js-optimise-string-trim branch from da7f50e to 5ba0cc9 Compare November 20, 2024 18:50

lpil and others added 2 commits November 25, 2024 19:01

v0.44.0

8349133

Optimise string trimming on JavaScript

5e9c98b

richard-viney force-pushed the js-optimise-string-trim branch from 5ba0cc9 to 5e9c98b Compare November 25, 2024 22:23

lpil approved these changes Nov 26, 2024

View reviewed changes

lpil merged commit b0afb8f into gleam-lang:main Nov 26, 2024
7 checks passed

richard-viney deleted the js-optimise-string-trim branch November 26, 2024 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise string trimming on JavaScript #736

Optimise string trimming on JavaScript #736

richard-viney commented Nov 12, 2024 •

edited

Loading

lpil left a comment •

edited

Loading

richard-viney commented Nov 20, 2024

lpil left a comment

richard-viney commented Nov 20, 2024

lpil left a comment

Optimise string trimming on JavaScript #736

Optimise string trimming on JavaScript #736

Conversation

richard-viney commented Nov 12, 2024 • edited Loading

lpil left a comment • edited Loading

Choose a reason for hiding this comment

richard-viney commented Nov 20, 2024

lpil left a comment

Choose a reason for hiding this comment

richard-viney commented Nov 20, 2024

lpil left a comment

Choose a reason for hiding this comment

richard-viney commented Nov 12, 2024 •

edited

Loading

lpil left a comment •

edited

Loading