-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust #66
base: master
Are you sure you want to change the base?
Rust #66
Conversation
Since the ❯ Rust/target/release/harness Towers 5 50
Starting Towers benchmark ...
Towers: iterations=1 runtime: 2634us
Towers: iterations=1 runtime: 2596us
Towers: iterations=1 runtime: 2608us
Towers: iterations=1 runtime: 2673us
Towers: iterations=1 runtime: 2647us
Towers: iterations=5 average:2632us total:13161us
Total Runtime: 13161us ❯ java -classpath Java/benchmarks.jar Harness Towers 5 50
Starting Towers benchmark ...
Towers: iterations=1 runtime: 11004us
Towers: iterations=1 runtime: 9299us
Towers: iterations=1 runtime: 3781us
Towers: iterations=1 runtime: 3015us
Towers: iterations=1 runtime: 3159us
Towers: iterations=5 average: 6051us total: 30258us
Total Runtime: 30258us ❯ node JavaScript/harness.js Towers 5 50
Starting Towers benchmark ...
Towers: iterations=1 runtime: 10869us
Towers: iterations=1 runtime: 5546us
Towers: iterations=1 runtime: 5441us
Towers: iterations=1 runtime: 5470us
Towers: iterations=1 runtime: 5474us
Towers: iterations=5 average: 6560us total: 32800us
Total Runtime: 32800us ❯ python Python/harness.py Towers 5 50
Starting Towers benchmark ...
Towers: iterations=1 runtime: 539472us
Towers: iterations=1 runtime: 530249us
Towers: iterations=1 runtime: 525513us
Towers: iterations=1 runtime: 524154us
Towers: iterations=1 runtime: 515153us
Towers: iterations=5 average: 526908us total: 2634541us
Total Runtime: 2634541us |
Very interesting. Thanks! I think, Rust is going to be a challenge. So far, we have only have support for languages with garbage collection, i.e., automated memory management. While I think that many of the benchmarks will have a stack discipline and could map well to Rust, I am not sure all benchmarks will. I suspect, there will be need for some careful though of how to do things, and weighing different alternatives/options. |
Idiomatic rust projects typically one of those things as a substitute for garbage collection:
Reference counting is going to be slow but one could view arena allocation or a vector as "cheating". There are a couple attempts to get tracing garbage collection in rust. Aside from rust-gc they are almost all experimental. Rread here for a quick overview. (NOTE: Despite being reliable and mature, the For now I would recommend using Using Unfortunately conventional wisdom puts (pervasive) use of reference counting as far slower than generational GC. Idiomatic rust code tends to pervasive use of reference counts, so they don't run into this performance problem very much. If they do, they just switch to arena allocation or indices in a vector (as mentioned above) . Unfortunately all of this combines to make a direct comparison between reference counting and traditional GC very difficult to do. You're either going to end up with code that is not "idiomatic Rust" or you're going to end up with code that optimizes away all the work 😦 |
I think, we will need to work through each benchmark and see what makes sense. Something like NBody, with a fixed set of bodies seems different than for instance the Json benchmark, where an "unknown" JSON file gets parsed. |
That's why I kept it to a java-way in the Rust code, it's not idiomatic rust code at all, there's a ton of needless memory allocations and all (like for the towers you'd generally either pass the count of disks as a constant argument to use the stack or you'd allocate all of them and index them instead for pure dynamism, which would be a single allocation), though that's really only mostly a thing on the tower's one at those sizes because I was only testing it with size 13 (as the other code), which is tiny enough that memory allocation time dominates (GC's could theoretically do very well on these though as they are capable of hoisting such things to the stack internally where Rust couldn't do that, I'm actually rather surprised at how poorly java performed, I'm tempted to look in to that...). Another rust way that is closer to the java code but still not entirely idiomatic is just to make each But worst case scenario the rust code can just fall back to using Rc's/Weak's (smart pointers) to be like any corresponding code of the other styles, so not an issue to map things regardless (as long as you don't want to be rust idiomatic, which would keep the algorithm the same but could massively invert how data is stored as rust is more soa style of programming).
Yeah, that's why I'm just copying the java for now, if optimizations past that or if you want to make things Rust idiomatic layer then that can always be done. But keeping it java-like at least gives a more direct apples-to-apples performance comparison even if it is not how one would actually write rust code.
The smart pointers, with associated
You don't even need to go that fancy, the "usual" rust way is to just allocate a big vector of things and index in to it because Rust really, like really wants a single ownership point and then index in to that, consequently it works very well with that pattern.
Ah yep, basically that, it's trivial to do, no libraries needed, and yes a newtype for the index makes it all easy to not ever mess up.
Eh, not really as slow as you'd think, remember single-threaded shared pointers in rust are a single integer comparison on cloning and destruction, no other time, and multi-threaded shared pointers (are there any multi-threaded benchmarks in this? I haven't noticed any as of yet) are the same but just using an atomic instruction (or mutex on arch's that don't have atomics).
No, don't touch those, they break some of the rust invariants around safety, they are not safe to use (safe in a Rust way, it's still far safer than, say, anything in java).
Actually those are the ones I'm most curious in! If you have a specific recommendation then I'll focus on porting that next, I want to see how an Rc based allocation-heavy non-rust style pattern works in Rust, just for pure curiosity sake. :-)
Yes,
If you do a lot of cloning of the root pointers (the shared pointers) then yeah, it can have a touch of overhead, but it's only in an extra, easily branch predicted comparison and jump, which is basically no cost on x86. And even then most of the time you don't do that as you don't really change ownership of the root pointers often anyway so even that's not an issue in the vast amount of cases.
Hence why I'm going with the non-idiomatic rust, I'm more curious in an apples-to-apples comparison for now. Can always make a
Hmm, the json might be a good one for me to port next, it's going to have a ton of tiny allocations regardless, not idiomatic but it'll work fine. I might still use an enum instead of traits because rust isn't built to do down/upcasting like what that code does (it's not an OOP language) and to emulate that would need a LOT of functions to do specific returns for each known variant, which ends up just being an enum anyway but significantly more verbose. This is a benchmark that I could see java doing decently on, though not a benchmark that would really take advantage of its tracing JIT so unsure if it could be 'faster' though, but tons of tiny allocations java could definitely do better on than rust calling the comparatively slow system allocator directly for each one. |
But yeah, if you think a certain benchmark should be remade to be idiomatic instead of just copying what java does, please state, otherwise I'll just try to copy java as close as possible as feasible. |
Added the json benchmark, the timings: ❯ Rust/target/release/harness Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 36172us
Json: iterations=1 runtime: 35525us
Json: iterations=1 runtime: 35747us
Json: iterations=1 runtime: 35324us
Json: iterations=1 runtime: 35346us
Json: iterations=5 average:35623us total:178116us
Total Runtime: 178116us ❯ java -classpath Java/benchmarks.jar Harness Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 118819us
Json: iterations=1 runtime: 101561us
Json: iterations=1 runtime: 59921us
Json: iterations=1 runtime: 60682us
Json: iterations=1 runtime: 92477us
Json: iterations=5 average: 86692us total: 433460us
Total Runtime: 433460us ❯ node JavaScript/harness.js Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 115848us
Json: iterations=1 runtime: 95531us
Json: iterations=1 runtime: 90106us
Json: iterations=1 runtime: 93786us
Json: iterations=1 runtime: 92901us
Json: iterations=5 average: 97634us total: 488172us
Total Runtime: 488172us I'm consistently impressed by the good showing by node. I wonder how deno compares to node on this. Of course the Rust code could could be vastly optimized (at the very least for readability, a little in performance though), I'm doing a whole lot of things very inefficiently and in a very verbose way to more copy the java way of things, but still. EDIT: Performed a quick Any suggestions on the next benchmark to add when I get time? |
Just for out of pure curiosity I ran all the tests again with tcmalloc preloaded into the processes (though I doubt it will do anything for the GC languages, but for completion anyway): ❯ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 Rust/target/release/harness Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 25179us
Json: iterations=1 runtime: 24678us
Json: iterations=1 runtime: 24184us
Json: iterations=1 runtime: 24199us
Json: iterations=1 runtime: 24698us
Json: iterations=5 average:24588us total:122940us
Total Runtime: 122940us Prior rust time was ❯ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 java -classpath Java/benchmarks.jar Harness Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 117712us
Json: iterations=1 runtime: 101946us
Json: iterations=1 runtime: 60810us
Json: iterations=1 runtime: 61014us
Json: iterations=1 runtime: 90687us
Json: iterations=5 average: 86433us total: 432169us
Total Runtime: 432169us Prior java time was ❯ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 node JavaScript/harness.js Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 116426us
Json: iterations=1 runtime: 94598us
Json: iterations=1 runtime: 92799us
Json: iterations=1 runtime: 96691us
Json: iterations=1 runtime: 93219us
Json: iterations=5 average: 98747us total: 493733us
Total Runtime: 493733us Prior node time was |
And made clippy happy again, which involved one performance change (just allocating an object within an allocated value instead of just allocating the value with the inline object, which shrunk the value size even if caused more allocations, which reduced the size taken of every allocated value, and funny enough it makes it even closer to the java style as it is doing an allocation that it wasn't doing before but that java is doing as well), which cause timing changes, here it is with and without tcmalloc: ❯ Rust/target/release/harness Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 21499us
Json: iterations=1 runtime: 21008us
Json: iterations=1 runtime: 21326us
Json: iterations=1 runtime: 21070us
Json: iterations=1 runtime: 21160us
Json: iterations=5 average:21212us total:106064us
Total Runtime: 106064us And with tcmalloc: ❯ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 Rust/target/release/harness Json 5 50
Starting Json benchmark ...
Json: iterations=1 runtime: 14263us
Json: iterations=1 runtime: 13936us
Json: iterations=1 runtime: 14813us
Json: iterations=1 runtime: 15601us
Json: iterations=1 runtime: 16927us
Json: iterations=5 average:15108us total:75541us
Total Runtime: 75541us Quite the sizeable change for them both by shrinking the EDIT: And just to see, I tested compiling with the latest nightly version of rust instead of the latest stable and it is consistently about 500us faster even after a lot of runs but it's technically still within the error bar so could just be noise regardless of how consistent it is. Will keep using the latest stable for all reported values though, no beta or nightly. |
Went ahead and quickly added the NBody benchmark this morning, clippy is happy, results: ❯ Rust/target/release/harness NBody 5 250000
Starting NBody benchmark ...
NBody: iterations=1 runtime: 25846us
NBody: iterations=1 runtime: 26132us
NBody: iterations=1 runtime: 26073us
NBody: iterations=1 runtime: 26288us
NBody: iterations=1 runtime: 26278us
NBody: iterations=5 average:26124us total:130620us
Total Runtime: 130620us ❯ java -classpath Java/benchmarks.jar Harness NBody 5 250000
Starting NBody benchmark ...
NBody: iterations=1 runtime: 37444us
NBody: iterations=1 runtime: 28579us
NBody: iterations=1 runtime: 28294us
NBody: iterations=1 runtime: 28348us
NBody: iterations=1 runtime: 28375us
NBody: iterations=5 average: 30208us total: 151040us
Total Runtime: 151040us ❯ node JavaScript/harness.js NBody 5 250000
Starting NBody benchmark ...
NBody: iterations=1 runtime: 43579us
NBody: iterations=1 runtime: 36293us
NBody: iterations=1 runtime: 37564us
NBody: iterations=1 runtime: 31052us
NBody: iterations=1 runtime: 31581us
NBody: iterations=5 average: 36014us total: 180069us
Total Runtime: 180069us |
Just as an aside, I did acquire Deno but it looks like the existing javascript benchmark is calling node'isms instead of the standard calls of things, so it would need a little massaging (doesn't seem like it would take much though). If a Deno one is added then the javascript one should probably be rewritten to fully typed typescript for a new benchmark case as Deno is a typescript runner (made by the same people who made node, it's eventual goal is to replace node) and it optimizes that case. |
So... And done and clippy is happy, results: ❯ Rust/target/release/harness Storage 5 150
Starting Storage benchmark ...
Storage: iterations=1 runtime: 21716us
Storage: iterations=1 runtime: 21039us
Storage: iterations=1 runtime: 21181us
Storage: iterations=1 runtime: 21229us
Storage: iterations=1 runtime: 21347us
Storage: iterations=5 average:21302us total:106514us
Total Runtime: 106514us ❯ java -classpath Java/benchmarks.jar Harness Storage 5 150
Starting Storage benchmark ...
Storage: iterations=1 runtime: 35176us
Storage: iterations=1 runtime: 17203us
Storage: iterations=1 runtime: 24834us
Storage: iterations=1 runtime: 24967us
Storage: iterations=1 runtime: 24664us
Storage: iterations=5 average: 25368us total: 126844us
Total Runtime: 126844us ❯ node JavaScript/harness.js Storage 5 150
Starting Storage benchmark ...
Storage: iterations=1 runtime: 29177us
Storage: iterations=1 runtime: 22583us
Storage: iterations=1 runtime: 24492us
Storage: iterations=1 runtime: 22750us
Storage: iterations=1 runtime: 25908us
Storage: iterations=5 average: 24982us total: 124910us
Total Runtime: 124910us And Rust with tcmalloc just for good measure: ❯ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 Rust/target/release/harness Storage 5 150
Starting Storage benchmark ...
Storage: iterations=1 runtime: 14289us
Storage: iterations=1 runtime: 14073us
Storage: iterations=1 runtime: 14734us
Storage: iterations=1 runtime: 14186us
Storage: iterations=1 runtime: 14074us
Storage: iterations=5 average:14271us total:71358us
Total Runtime: 71358us This is utterly absolutely not idiomatic Rust, it's very much Java-In-Rust, lol, but should be apples to apples. I'm impressed by node here, it did better than Java! |
And added Sieve since it was also very tiny to do, no real memory allocations in it, it's all just 'work', clippy is happy, results: ❯ Rust/target/release/harness Sieve 5 500
Starting Sieve benchmark ...
Sieve: iterations=1 runtime: 8839us
Sieve: iterations=1 runtime: 8738us
Sieve: iterations=1 runtime: 8890us
Sieve: iterations=1 runtime: 8868us
Sieve: iterations=1 runtime: 8838us
Sieve: iterations=5 average:8835us total:44176us
Total Runtime: 44176us ❯ java -classpath Java/benchmarks.jar Harness Sieve 5 500
Starting Sieve benchmark ...
Sieve: iterations=1 runtime: 26986us
Sieve: iterations=1 runtime: 8576us
Sieve: iterations=1 runtime: 8414us
Sieve: iterations=1 runtime: 8138us
Sieve: iterations=1 runtime: 8657us
Sieve: iterations=5 average: 12154us total: 60771us
Total Runtime: 60771us ❯ node JavaScript/harness.js Sieve 5 500
Starting Sieve benchmark ...
Sieve: iterations=1 runtime: 38318us
Sieve: iterations=1 runtime: 33700us
Sieve: iterations=1 runtime: 33777us
Sieve: iterations=1 runtime: 33516us
Sieve: iterations=1 runtime: 33839us
Sieve: iterations=5 average: 34630us total: 173150us
Total Runtime: 173150us I'm honestly impressed at how poorly node did here... >.> |
Decided to tackle a huge one for this next one, so chose Richards. Oh... oh wow... this... is probably the non-idiomatic Rust so far, soooooo many shared pointers and lack of proper ownership and so many reference counts and oh the cloning, I see the horror of cloning everywhere, lol. But to sum it up, again programmed to copy the Java one, it is exceptionally not idiomatic and I really really resisted fixing a lot of the code to actually be ownership aware and fast, and on the plus side I found a bug in the java version of the code (that doesn't change the output, it's purely just a double check if something is null for some weird reason...). So, the results of this reference counted shared pointer horror: ❯ Rust/target/release/harness Richards 5 50
Starting Richards benchmark ...
Richards: iterations=1 runtime: 56736us
Richards: iterations=1 runtime: 56431us
Richards: iterations=1 runtime: 56884us
Richards: iterations=1 runtime: 55852us
Richards: iterations=1 runtime: 55851us
Richards: iterations=5 average: 56351us total:281756us
Total Runtime: 281756us ❯ java -classpath Java/benchmarks.jar Harness Richards 5 50
Starting Richards benchmark ...
Richards: iterations=1 runtime: 103699us
Richards: iterations=1 runtime: 78204us
Richards: iterations=1 runtime: 79203us
Richards: iterations=1 runtime: 79454us
Richards: iterations=1 runtime: 80801us
Richards: iterations=5 average: 84272us total: 421361us
Total Runtime: 421361us
❯ node JavaScript/harness.js Richards 5 50
Starting Richards benchmark ...
Richards: iterations=1 runtime: 155446us
Richards: iterations=1 runtime: 123969us
Richards: iterations=1 runtime: 124509us
Richards: iterations=1 runtime: 123459us
Richards: iterations=1 runtime: 123907us
Richards: iterations=5 average: 130258us total: 651290us
Total Runtime: 651290us |
And just for completion, tcmalloc didn't make any appreciable change (not really much allocating going on, mostly just shared pointer management): ❯ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 Rust/target/release/harness Richards 5 50
Starting Richards benchmark ...
Richards: iterations=1 runtime: 56808us
Richards: iterations=1 runtime: 56458us
Richards: iterations=1 runtime: 56142us
Richards: iterations=1 runtime: 56542us
Richards: iterations=1 runtime: 55691us
Richards: iterations=5 average: 56328us total:281643us
Total Runtime: 281643us A |
@OvermindDL1 thanks for putting all the effort into this. Please don't consider my silence and absence of interest. It's just end of term time here. So, pretty busy time of the year... |
@smarr Oh no worry, I'm doing this just for fun, it's a nice break from my usual work as I get some free time. :-) |
Went ahead and added the Queens benchmark as well as it was extremely simple, results: ❯ Rust/target/release/harness Queens 5 500
Starting Queens benchmark ...
Queens: iterations=1 runtime: 5644us
Queens: iterations=1 runtime: 5452us
Queens: iterations=1 runtime: 5574us
Queens: iterations=1 runtime: 5520us
Queens: iterations=1 runtime: 5431us
Queens: iterations=5 average: 5524us total:27623us
Total Runtime: 27623us ❯ java -classpath Java/benchmarks.jar Harness Queens 5 500
Starting Queens benchmark ...
Queens: iterations=1 runtime: 24963us
Queens: iterations=1 runtime: 14836us
Queens: iterations=1 runtime: 16239us
Queens: iterations=1 runtime: 15231us
Queens: iterations=1 runtime: 15090us
Queens: iterations=5 average: 17271us total: 86359us
Total Runtime: 86359us ❯ node JavaScript/harness.js Queens 5 500
Starting Queens benchmark ...
Queens: iterations=1 runtime: 43868us
Queens: iterations=1 runtime: 36680us
Queens: iterations=1 runtime: 36755us
Queens: iterations=1 runtime: 36444us
Queens: iterations=1 runtime: 35992us
Queens: iterations=5 average: 37948us total: 189739us
Total Runtime: 189739us Again, I'd figure V8/node's JIT would optimize simple math better for things like this... |
And added Permute, another super simple one, the results: ❯ Rust/target/release/harness Permute 5 500
Starting Permute benchmark ...
Permute: iterations=1 runtime: 15935us
Permute: iterations=1 runtime: 15860us
Permute: iterations=1 runtime: 15960us
Permute: iterations=1 runtime: 16043us
Permute: iterations=1 runtime: 15945us
Permute: iterations=5 average: 15949us total:79745us
Total Runtime: 79745us ❯ java -classpath Java/benchmarks.jar Harness Permute 5 500
Starting Permute benchmark ...
Permute: iterations=1 runtime: 25438us
Permute: iterations=1 runtime: 18439us
Permute: iterations=1 runtime: 18734us
Permute: iterations=1 runtime: 18085us
Permute: iterations=1 runtime: 18539us
Permute: iterations=5 average: 19847us total: 99235us
Total Runtime: 99235us ❯ node JavaScript/harness.js Permute 5 500
Starting Permute benchmark ...
Permute: iterations=1 runtime: 34046us
Permute: iterations=1 runtime: 31025us
Permute: iterations=1 runtime: 31057us
Permute: iterations=1 runtime: 31045us
Permute: iterations=1 runtime: 30197us
Permute: iterations=5 average: 31474us total: 157370us
Total Runtime: 157370us
|
And adding Mandelbrot, I'd expect all to perform pretty equally here, it's just all easily optimized math: ❯ Rust/target/release/harness Mandelbrot 5 750
Starting Mandelbrot benchmark ...
Mandelbrot: iterations=1 runtime: 114109us
Mandelbrot: iterations=1 runtime: 113950us
Mandelbrot: iterations=1 runtime: 114004us
Mandelbrot: iterations=1 runtime: 114195us
Mandelbrot: iterations=1 runtime: 113769us
Mandelbrot: iterations=5 average: 114005us total:570029us
Total Runtime: 570029us ❯ java -classpath Java/benchmarks.jar Harness Mandelbrot 5 750
Starting Mandelbrot benchmark ...
Mandelbrot: iterations=1 runtime: 184574us
Mandelbrot: iterations=1 runtime: 179949us
Mandelbrot: iterations=1 runtime: 130062us
Mandelbrot: iterations=1 runtime: 179491us
Mandelbrot: iterations=1 runtime: 179937us
Mandelbrot: iterations=5 average: 170802us total: 854013us
Total Runtime: 854013us ❯ node JavaScript/harness.js Mandelbrot 5 750
Starting Mandelbrot benchmark ...
Mandelbrot: iterations=1 runtime: 116836us
Mandelbrot: iterations=1 runtime: 112234us
Mandelbrot: iterations=1 runtime: 114767us
Mandelbrot: iterations=1 runtime: 112838us
Mandelbrot: iterations=1 runtime: 114834us
Mandelbrot: iterations=5 average: 114302us total: 571509us
Total Runtime: 571509us Node and Rust are about equal, but even after trying many runs of the java one (I always run many of each and take the best, but I ran the java one about 10 times more than that) it seemed stuck at that surprisingly higher value, java 17.0.1 still... Grabbing java 18 just to see... no still same values, how odd... |
And now doing the ...ow, this hurts me to write, this is... not just not idiomatic rust, this is so inefficient, lol... All the needless memory allocations and shared pointers and all... ^.^; Well, the results of this amazingly reference-counted shared-pointer multiple-ownership horror, lol: ❯ Rust/target/release/harness List 5 750
Starting List benchmark ...
List: iterations=1 runtime: 19536us
List: iterations=1 runtime: 19455us
List: iterations=1 runtime: 19477us
List: iterations=1 runtime: 19410us
List: iterations=1 runtime: 19580us
List: iterations=5 average: 19492us total:97461us
Total Runtime: 97461us (Still using Java 18 for note since I updated it in the last message) ❯ java -classpath Java/benchmarks.jar Harness List 5 750
Starting List benchmark ...
List: iterations=1 runtime: 28507us
List: iterations=1 runtime: 21232us
List: iterations=1 runtime: 21681us
List: iterations=1 runtime: 21281us
List: iterations=1 runtime: 21227us
List: iterations=5 average: 22785us total: 113928us
Total Runtime: 113928us ❯ node JavaScript/harness.js List 5 750
Starting List benchmark ...
List: iterations=1 runtime: 37219us
List: iterations=1 runtime: 33721us
List: iterations=1 runtime: 32754us
List: iterations=1 runtime: 32395us
List: iterations=1 runtime: 32519us
List: iterations=5 average: 33722us total: 168608us
Total Runtime: 168608us |
And the last short one (3 big ones left after this) as I'm just about out of time for today is ❯ Rust/target/release/harness Bounce 5 750
Starting Bounce benchmark ...
Bounce: iterations=1 runtime: 14314us
Bounce: iterations=1 runtime: 14745us
Bounce: iterations=1 runtime: 14758us
Bounce: iterations=1 runtime: 14743us
Bounce: iterations=1 runtime: 17703us
Bounce: iterations=5 average: 15253us total:76265us
Total Runtime: 76265us ❯ java -classpath Java/benchmarks.jar Harness Bounce 5 750
Starting Bounce benchmark ...
Bounce: iterations=1 runtime: 33581us
Bounce: iterations=1 runtime: 16207us
Bounce: iterations=1 runtime: 17237us
Bounce: iterations=1 runtime: 16462us
Bounce: iterations=1 runtime: 21529us
Bounce: iterations=5 average: 21003us total: 105016us
Total Runtime: 105016us ❯ node JavaScript/harness.js Bounce 5 750
Starting Bounce benchmark ...
Bounce: iterations=1 runtime: 37600us
Bounce: iterations=1 runtime: 23069us
Bounce: iterations=1 runtime: 23056us
Bounce: iterations=1 runtime: 23465us
Bounce: iterations=1 runtime: 22913us
Bounce: iterations=5 average: 26021us total: 130103us
Total Runtime: 130103us Huh, that's a much larger variance than I anticipated... I wonder why... The odd bouncing timings of the java one was odd too but I ran that one a few dozen times as well and that same pattern held for all of them, the above was the best time of those as always. |
I tried to work on Havlak a bit yesterday but, uh... I think there are a couple of bugs with it and rust doesn't like them enough that it refuses to compile it, so going to have to restructure it a touch before I can compile it (Rust really really doesn't like alias bugs), and although the Java code it is based on 'works' for the benchmark, it's broken enough that if it were used for anything more than the simple benchmark then it wouldn't really work as it is. So this will be the most 'different' one from java but it will still be extremely non-idiomatic rust with plenty of needless allocations and questionably buggy java-like constructs (which if I were to do it idiomatically then I'd grab the petgraph library and implement the loop detector on it, but still...). I might get time today to work more on it. As an aside, since I had to implement more of the |
Anything specific that might be worth fixing in the other languages, too?
Yes, indeed they are. If someone has an idea of how to improve things in a portable way, without starting to rely on non-deterministic things, i.e., we can't rely on anything like hash values derived from memory locations etc, I am very open to improvements :) |
The main issues are aliasing issues. The benchmarking is using it in 'just' such a way that it's not an issue, but if it were an actual library for production use it would be easy to accidentally put it into a bad state. Not sure it's worth worrying about right now since fixing that would involve a little bit of extra code, which would slow it down (barely, but still), Rust can't do that though, it must be safe and correct so it has some more checks (though my checks are more of just kill the program if something bad happens since I have to do "something", it's sufficient for the benchmark since those aren't hit for it).
Uh.... yeah in Havlak here in both Java and Rust the
Honestly I'm quite good with using the standard collections included in languages, they are what's most often used after all for each language so it is would people would expect. However, for consistency probably just preallocating big arrays ahead of time (since we know the max possible runtime size of these things) seems much better, in addition to the occasional helper function to do something like get a bucket for a set/dictionary in such an array or so. At the very least the If you care about hashing consistency though I'd be surprised if every language didn't have a way to use custom hash functions, I know both java and rust do at least in their standard libraries, and for the few languages that don't I guess they could define a custom hashmap instead? Using a custom hasher in Rust is pretty common as the standard library one is a cryptographically seeded hasher for randomness to prevent DOS attacks via known bad hash keys, but most uses of the hashmap doesn't need such security so they generally select a different hasher than the default cryptographically seeded one. |
Just as a test, I changed the rust one here from having |
And I changed my local java as well to also use Dictionary inside the Set (and fixed the CustomHash implementation, that was veeeery non-standard java and such was impossible to use for uncontrolled types, fixed here), and it had a good drop in times too. With using a Dictionary as the internal Set implementation now the times for the Rust and Java ones are vastly more logical, and thus are: ❯ Rust/target/release/harness Havlak 5 15000
Starting Havlak benchmark ...
Havlak: iterations=1 runtime: 287109us
Havlak: iterations=1 runtime: 287246us
Havlak: iterations=1 runtime: 278570us
Havlak: iterations=1 runtime: 278709us
Havlak: iterations=1 runtime: 277514us
Havlak: iterations=5 average: 281830us total: 1409150us
Total Runtime: 1409150us ❯ java -classpath Java/benchmarks.jar Harness Havlak 5 15000
Starting Havlak benchmark ...
Havlak: iterations=1 runtime: 520327us
Havlak: iterations=1 runtime: 270325us
Havlak: iterations=1 runtime: 172282us
Havlak: iterations=1 runtime: 210959us
Havlak: iterations=1 runtime: 237777us
Havlak: iterations=5 average: 282334us total: 1411670us
Total Runtime: 1411670us Essentially identical, which is about what I would expect considering the work being done. That Not committing this though because I changed things that are not Rust, even though I do think it is changes that needs to be done. Here's the patch for the changed java code, unsure if it's "complete" but it was enough to compile everything and run the benchmarks faster (probably needs a formatter pass ran over it, I just quick edited it in vim): diff --git a/benchmarks/Java/src/deltablue/Strength.java b/benchmarks/Java/src/deltablue/Strength.java
index 0599392..7953a04 100644
--- a/benchmarks/Java/src/deltablue/Strength.java
+++ b/benchmarks/Java/src/deltablue/Strength.java
@@ -19,7 +19,7 @@ import som.IdentityDictionary;
*/
public final class Strength {
- public static class Sym implements CustomHash {
+ public static class Sym {
private final int hash;
@@ -27,7 +27,6 @@ public final class Strength {
this.hash = hash;
}
- @Override
public int customHash() {
return hash;
}
@@ -80,7 +79,7 @@ public final class Strength {
}
private static IdentityDictionary<Sym, Integer> createStrengthTable() {
- IdentityDictionary<Sym, Integer> strengthTable = new IdentityDictionary<>();
+ IdentityDictionary<Sym, Integer> strengthTable = new IdentityDictionary<>(obj -> obj.customHash());
strengthTable.atPut(ABSOLUTE_STRONGEST, -10000);
strengthTable.atPut(REQUIRED, -800);
strengthTable.atPut(STRONG_PREFERRED, -600);
@@ -93,7 +92,7 @@ public final class Strength {
}
private static IdentityDictionary<Sym, Strength> createStrengthConstants() {
- IdentityDictionary<Sym, Strength> strengthConstant = new IdentityDictionary<>();
+ IdentityDictionary<Sym, Strength> strengthConstant = new IdentityDictionary<>(obj -> obj.customHash());
strengthTable.getKeys().forEach(key ->
strengthConstant.atPut(key, new Strength(key))
);
diff --git a/benchmarks/Java/src/havlak/BasicBlock.java b/benchmarks/Java/src/havlak/BasicBlock.java
index 2154af1..7d01847 100644
--- a/benchmarks/Java/src/havlak/BasicBlock.java
+++ b/benchmarks/Java/src/havlak/BasicBlock.java
@@ -24,7 +24,7 @@ import som.Vector;
*
* @author rhundt
*/
-final class BasicBlock implements CustomHash {
+final class BasicBlock {
private final Vector<BasicBlock> inEdges;
private final Vector<BasicBlock> outEdges;
@@ -56,7 +56,6 @@ final class BasicBlock implements CustomHash {
inEdges.append(from);
}
- @Override
public int customHash() {
return name;
}
diff --git a/benchmarks/Java/src/havlak/HavlakLoopFinder.java b/benchmarks/Java/src/havlak/HavlakLoopFinder.java
index 2eea9a2..98c1514 100644
--- a/benchmarks/Java/src/havlak/HavlakLoopFinder.java
+++ b/benchmarks/Java/src/havlak/HavlakLoopFinder.java
@@ -37,7 +37,7 @@ final class HavlakLoopFinder {
private final Vector<Set<Integer>> nonBackPreds = new Vector<Set<Integer>>();
private final Vector<Vector<Integer>> backPreds = new Vector<>();
- private final IdentityDictionary<BasicBlock, Integer> number = new IdentityDictionary<>();
+ private final IdentityDictionary<BasicBlock, Integer> number = new IdentityDictionary<>(obj -> obj.customHash());
private int maxSize = 0;
private int[] header;
private BasicBlockClass[] type;
@@ -181,7 +181,7 @@ final class HavlakLoopFinder {
}
for (int i = 0; i < size; ++i) {
- nonBackPreds.append(new Set<>());
+ nonBackPreds.append(new Set<>(obj -> obj));
backPreds.append(new Vector<>());
nodes[i] = new UnionFindNode();
}
diff --git a/benchmarks/Java/src/havlak/SimpleLoop.java b/benchmarks/Java/src/havlak/SimpleLoop.java
index ddb3ffd..03a536c 100644
--- a/benchmarks/Java/src/havlak/SimpleLoop.java
+++ b/benchmarks/Java/src/havlak/SimpleLoop.java
@@ -37,6 +37,8 @@ import som.IdentitySet;
* a candidate for transformations, and what not.
*/
final class SimpleLoop {
+ private static int uid = 0;
+ private final int id;
private final IdentitySet<BasicBlock> basicBlocks;
private final IdentitySet<SimpleLoop> children;
@@ -53,13 +55,15 @@ final class SimpleLoop {
SimpleLoop(final BasicBlock bb, final boolean isReducible) {
+ id = uid;
+ uid++;
this.isReducible = isReducible;
parent = null;
isRoot = false;
nestingLevel = 0;
depthLevel = 0;
- basicBlocks = new IdentitySet<>();
- children = new IdentitySet<>();
+ basicBlocks = new IdentitySet<>(obj -> obj.customHash());
+ children = new IdentitySet<>(obj -> obj.id);
if (bb != null) {
basicBlocks.add(bb);
diff --git a/benchmarks/Java/src/som/Dictionary.java b/benchmarks/Java/src/som/Dictionary.java
index 01bebd2..7d6ebcc 100644
--- a/benchmarks/Java/src/som/Dictionary.java
+++ b/benchmarks/Java/src/som/Dictionary.java
@@ -25,16 +25,17 @@ package som;
import som.Dictionary.CustomHash;
-public class Dictionary<K extends CustomHash, V> {
+public class Dictionary<K, V> {
- public interface CustomHash {
- int customHash();
+ public interface CustomHash<T> {
+ int customHash(T obj);
}
protected static final int INITIAL_CAPACITY = 16;
private Entry<K, V>[] buckets;
- private int size;
+ private int size;
+ private CustomHash<K> hasher;
static class Entry<K, V> {
@@ -56,19 +57,20 @@ public class Dictionary<K extends CustomHash, V> {
}
@SuppressWarnings("unchecked")
- public Dictionary(final int size) {
+ public Dictionary(final int size, CustomHash<K> hasher) {
this.buckets = new Entry[size];
+ this.hasher = hasher;
}
- public Dictionary() {
- this(INITIAL_CAPACITY);
+ public Dictionary(CustomHash<K> hasher) {
+ this(INITIAL_CAPACITY, hasher);
}
- private static <K extends CustomHash> int hash(final K key) {
+ private int hash(final K key) {
if (key == null) {
return 0;
}
- int hash = key.customHash();
+ int hash = hasher.customHash(key);
return hash ^ hash >>> 16;
}
@@ -244,4 +246,14 @@ public class Dictionary<K extends CustomHash, V> {
}
return values;
}
+
+ public void forEachKey(final ForEachInterface<K> fn) {
+ for (int i = 0; i < buckets.length; ++i) {
+ Entry<K, V> current = buckets[i];
+ while (current != null) {
+ fn.apply(current.key);
+ current = current.next;
+ }
+ }
+ }
}
diff --git a/benchmarks/Java/src/som/IdentityDictionary.java b/benchmarks/Java/src/som/IdentityDictionary.java
index 7d49b50..7f0d5d6 100644
--- a/benchmarks/Java/src/som/IdentityDictionary.java
+++ b/benchmarks/Java/src/som/IdentityDictionary.java
@@ -3,7 +3,7 @@ package som;
import som.Dictionary.CustomHash;
-public class IdentityDictionary<K extends CustomHash, V> extends Dictionary<K, V> {
+public class IdentityDictionary<K, V> extends Dictionary<K, V> {
static class IdEntry<K, V> extends Entry<K, V> {
IdEntry(final int hash, final K key, final V value, final Entry<K, V> next) {
@@ -16,12 +16,12 @@ public class IdentityDictionary<K extends CustomHash, V> extends Dictionary<K, V
}
}
- public IdentityDictionary(final int size) {
- super(size);
+ public IdentityDictionary(final int size, Dictionary.CustomHash<K> hasher) {
+ super(size, hasher);
}
- public IdentityDictionary() {
- super(INITIAL_CAPACITY);
+ public IdentityDictionary(Dictionary.CustomHash<K> hasher) {
+ super(INITIAL_CAPACITY, hasher);
}
@Override
diff --git a/benchmarks/Java/src/som/IdentitySet.java b/benchmarks/Java/src/som/IdentitySet.java
index a699d4a..7ab0872 100644
--- a/benchmarks/Java/src/som/IdentitySet.java
+++ b/benchmarks/Java/src/som/IdentitySet.java
@@ -25,16 +25,16 @@ package som;
public final class IdentitySet<E> extends Set<E> {
- public IdentitySet() {
- super();
+ public IdentitySet(Dictionary.CustomHash<E> hasher) {
+ super(hasher);
}
- public IdentitySet(final int size) {
- super(size);
+ public IdentitySet(final int size, Dictionary.CustomHash<E> hasher) {
+ super(size, hasher);
}
- @Override
- public boolean contains(final E obj) {
- return hasSome(e -> { return e == obj; });
- }
+// @Override
+// public boolean contains(final E obj) {
+// return hasSome(e -> { return e == obj; });
+// }
}
diff --git a/benchmarks/Java/src/som/Set.java b/benchmarks/Java/src/som/Set.java
index 8e52468..78ead9d 100644
--- a/benchmarks/Java/src/som/Set.java
+++ b/benchmarks/Java/src/som/Set.java
@@ -23,14 +23,14 @@
package som;
public class Set<E> {
- private final Vector<E> items;
+ private final Dictionary<E, Void> items;
- public Set() {
- this(Constants.INITIAL_SIZE);
+ public Set(Dictionary.CustomHash<E> hasher) {
+ this(Constants.INITIAL_SIZE, hasher);
}
- public Set(final int size) {
- items = new Vector<E>(size);
+ public Set(final int size, Dictionary.CustomHash<E> hasher) {
+ items = new Dictionary<E, Void>(size, hasher);
}
public int size() {
@@ -38,21 +38,11 @@ public class Set<E> {
}
public void forEach(final ForEachInterface<E> fn) {
- items.forEach(fn);
- }
-
- public boolean hasSome(final TestInterface<E> fn) {
- return items.hasSome(fn);
- }
-
- public E getOne(final TestInterface<E> fn) {
- return items.getOne(fn);
+ items.forEachKey(fn);
}
public void add(final E obj) {
- if (!contains(obj)) {
- items.append(obj);
- }
+ items.atPut(obj, null);
}
public <T> Vector<T> collect(final CollectInterface<E, T> fn) {
@@ -65,7 +55,7 @@ public class Set<E> {
}
public boolean contains(final E obj) {
- return hasSome(e -> { return e.equals(obj); });
+ return items.containsKey(obj);
}
public void removeAll() { |
Mostly just adding some rust code as a time sink so far so unsure how quickly it will be completed (feel free to PR updates to my fork if anyone wants?).
Completed so far is just
Towers
.Primarily I'm just sending this draft PR for now to make sure that I'm following the same pattern and for any other obvious issues, though it is not idiomatic Rust whatsoever, especially with the huge amount of needless memory allocation I've done so far in an attempt to precisely duplicate the Java code as closely as possible (all this memory allocation is so painful to me, ^.^).
I'll add more benchmarks as I get occasional free time (or others PR things to my fork).
A couple things of note, the main file is named
bin/harness.rs
instead of the usual main, to follow the pattern of the Java code and contribution recommendations, not standard rust though and there is a .gitignore to only include that which is actually wanted (no build artifacts) in the rust section itself. It does of course need the recent stable rust compile version to compile (which you can get via rustup), and I'm using no dependencies thus far (much as the usual few are very tempting...).Clippy, the official rust linter, is set to
pedantic
mode and the code is clean of all warnings, as per the recommended contribution suggestions. The code is also formatted with the official rust formatter as well. Both are part of cargo (the official rust build system).To compile it just go to
benchmarks/Rust
and (with latest stable rust installed via rustup or so) runcargo build --release
to build a release version and then run it withtarget/release/harness
, or do it all in one step withcargo run --release -- <argshere>
.