-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve algorithm to count digits in Long #413
base: develop
Are you sure you want to change the base?
Conversation
@@ -135,6 +107,34 @@ public fun Sink.writeDecimalLong(long: Long) { | |||
} | |||
} | |||
|
|||
private fun countDigitsIn(v: Long): Int { | |||
val guess = ((64 - v.countLeadingZeroBits()) * 10) ushr 5 | |||
return guess + (if (v > DigitCountToLargestValue[guess]) 1 else 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC from the time I read Romain's blogpost, by extending DigitCountToLargestValue
's length to the next power of two (32 in this case) and replacing DigitCountToLargestValue[guess]
with DigitCountToLargestValue[guess.and(0x1f)]
you can win a few extra percents of performance on JVM (as it should optimize out bounds checks performed on array access).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DigitCountToLargestValue
is actually slightly different than the table used in the blogpost:
private val PowersOfTen = longArrayOf(
0,
10,
100,
1000,
10000,
100000,
1000000,
10000000,
100000000,
1000000000,
10000000000,
100000000000,
1000000000000,
10000000000000,
100000000000000,
1000000000000000,
10000000000000000,
100000000000000000,
1000000000000000000
)
The main reason is that the original table doesn't work when the input is Long.MAX_VALUE
, as it's bigger than 10^18 (last value in the array), but 10^19 is outside of the Long range.
I wonder if the one in the PR performs better? Worth benchmarking them against each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that loads from DigitCountToLargestValue
table are compiled into a code that checks if an index is within array's bounds before performing a load.
However, if compiler can prove that indices are always in bounds, it'll abstain from generating the check.
By expanding the table to have a power-of-two length (and filling meaningless cells with, let's say, -1
) and then explicitly truncating index's most significant bits (i.e., dividing an index by table's length and taking the remainder), we can hint a compiler that a value is always in bounds and it'll generate faster code: https://gist.github.com/fzhinkin/42997a2cfc18a437f88e9c31bef969c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I checked and on Android the power-of-two array + truncation doesn't remove the bounds check. It just adds an extra instruction. See https://godbolt.org/z/jdTzMcxbf
@Egorand thanks for opening the PR! |
We have a benchmark on So I drafted a benchmark that writes a pack of different values:
For some reason, code using the old implementation (from the
It's worth checking what's causing the regression. |
That's interesting! @romainguy - wonder if you could share your benchmarks for comparison, and whether you have thoughts on what could be causing the results. I'll find some time to dig deeper and investigate! |
I don't have the original benchmark but it wasn't done on JVM but on Android, so different runtime and hardware. However I used a dataset with a zipf distribution to be somewhat realistic and avoid favoring well predicted branches. @fzhinkin's trick is something I've used in the past (it works great in C++ but for other reasons) and it's definitely worth a try. |
Worth mentioning, on an Android device, a version from this PR performs slightly better. |
Copies the PR merged into Okio: square/okio#1548.
The algorithm is based on "Down Another Rabbit Hole" by Romain Guy.
TLDR: this algorithm improves the performance of calculating the number of digits in a Long number by 40%, based on Romain's benchmarks.