-
-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise string trimming on JavaScript #736
Conversation
b510efa
to
5303d38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that regex doesn't iterate faster than a JS loop.
Is it faster due to not using regex or is it because string slicing is being used instead of string replacement?
2952fc8
to
da7f50e
Compare
On Node.js 22.11 the speedup comes from both not using a regex, and also from using string slicing instead of string replacement:
However, turns out it this also very runtime dependent. While the initial code on this PR was an order of magnitude faster on Node/Deno, it's actually slower on Bun. This seems to be because Bun handles a regex replace better than it can JIT optimise the hand-written trimming code. I've therefore changed this PR to the following:
This gives the best performance on all runtimes. If taking different code paths based on the JS runtime is undesirable then what we could do instead is only implement points 1, 2, and potentially 5 as listed above. This would be simpler, still give a meaningful speedup on all runtimes (at least 2-3x), but does leave a fair amount of performance on the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to have any runtime specific checks in the stdlib, especially since we'd need to have checks for the browsers (which are the main places Gleam JS use used) so let's stick with the optimised regex for all platforms.
da7f50e
to
5ba0cc9
Compare
Have changed to only apply the regex-based improvements. If in future anyone comes across this thread searching for a boost to trimming performance on JS and mainly cares about V8-based runtimes, try the following code to get trimming performance that can be many times faster: export function trim(string) {
const start_index = find_non_whitespace_char(string);
let end_index = rfind_non_whitespace_char(string) + 1;
if (end_index < start_index) {
end_index = start_index;
}
return string.substring(start_index, end_index);
}
export function trim_start(string) {
return string.substring(find_non_whitespace_char(string));
}
export function trim_end(string) {
return string.substring(0, rfind_non_whitespace_char(string) + 1);
}
function isUnicodeWhitespace(c) {
return (
c === "\u0020" || // Space
c === "\u0009" || // Horizontal tab
c === "\u000A" || // Line feed
c === "\u000B" || // Vertical tab
c === "\u000C" || // Form feed
c === "\u000D" || // Carriage return
c === "\u0085" || // Next line
c === "\u2028" || // Line separator
c === "\u2029" // Paragraph separator
);
}
function find_non_whitespace_char(string) {
let i = 0;
for (; i < string.length; i++) {
if (!isUnicodeWhitespace(string[i])) {
break;
}
}
return i;
}
function rfind_non_whitespace_char(string) {
let i = string.length - 1;
for (; i >= 0; i--) {
if (!isUnicodeWhitespace(string[i])) {
break;
}
}
return i;
} |
5ba0cc9
to
5e9c98b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
Optimises the regexes used for string trimming on JavaScript be to 1.5-3x faster, depending on the input and trimming function used.