-
Notifications
You must be signed in to change notification settings - Fork 25
UncleJim vs. PCollections
PCollections competes only with UncleJim's first bullet point: immutable collections, so I'm only going to talk about that in this document.
Clojure's vector (list) and hashMap/hashSet have O(log32 n) performance, which scales better than the O(log2 n) binary tree structures PCollections uses for its vector.
This graph shows how many operations each lookup requires (on the y-axis) for a given number of items in the collection (on the x-axis). The red line is the fast log32, the blue is the slower log2. Daniel Spiewak explains all the ramifications of this better than I ever could: https://www.youtube.com/watch?v=pNhBQJN44YQ
#Benchmarks Benchmarks below from a quad-core i7-4790K CPU @ 4.00GHz with hyper-threading disabled. So far this compares just the immutable vector (non-linked list) in UncleJim vs. PCollections. Both vector implementations are based on different powers of 2. I used powers of 10 for the vector sizes because I thought they would be unlikely to correspond to any particularly good or bad performance points in either Vector implementation.
10 items: 1.4x faster
100: 2.8x faster
1000: 4.4x faster
10000: 5.9x faster
100000: 7.7x faster
The numbers show that UncleJim's Vector from Clojure is faster and it scales better than PCollections Vector. The performance gap increases by a logarithmic factor of a base near 4, which is roughly what was predicted in the Big O analysis above. I tried running some tests on accessing each item by index and by iterator, but because of the way I ran them, all I could tell is that building the collection took about 3x longer than accessing the data in it once it was built. This proportion of build time to access time was about equal across the board.
I built the vector from Strings using the ordinal(int n)
function which takes a little time to produce, but I don't think this makes much difference.
java -jar target/benchmarks.jar -wi 5 -i 7
Benchmark Mode Cnt Score Error Units
MyBenchmark.buildImList10 thrpt 70 3690204.388 ± 7137.675 ops/s
MyBenchmark.buildImList100 thrpt 70 340778.855 ± 3838.734 ops/s
MyBenchmark.buildImList1000 thrpt 70 30860.163 ± 245.794 ops/s
MyBenchmark.buildImList10000 thrpt 70 2869.233 ± 8.490 ops/s
MyBenchmark.buildImList100000 thrpt 70 271.891 ± 0.642 ops/s
MyBenchmark.buildTreePVector10 thrpt 70 2580167.387 ± 12669.661 ops/s
MyBenchmark.buildTreePVector100 thrpt 70 120573.558 ± 316.766 ops/s
MyBenchmark.buildTreePVector1000 thrpt 70 6939.288 ± 12.626 ops/s
MyBenchmark.buildTreePVector10000 thrpt 70 488.121 ± 1.626 ops/s
MyBenchmark.buildTreePVector100000 thrpt 70 35.501 ± 0.070 ops/s
##Tests:
public ImList<String> buildImList(ImList<String> empty, int maxIdx) {
for (int i = 0; i < maxIdx; i++) {
empty = empty.append(ordinal(i));
}
return empty;
}
public PVector<String> buildPVector(PVector<String> empty, int maxIdx) {
for (int i = 0; i < maxIdx; i++) {
empty = empty.plus(ordinal(i));
}
return empty;
}
@Benchmark public void buildImList10() {
ImList<String> vec = buildImList(PersistentVector.empty(), 10);
String last = vec.get(9);
assert("9th".equals(last));
}
@Benchmark public void buildTreePVector10() {
PVector<String> vec = buildPVector(TreePVector.empty(), 10);
String last = vec.get(9);
assert("9th".equals(last));
}
// And so on using lots of cut and pasted code for 100, 1000, 10000, etc.
#Further Thoughts The Clojure collections also walk the sibling nodes in the internal data trees of these structures to provide iterators, which I thought might be an advantage. PCollections starts from the top of the tree doing an index lookup for each item, then increments the index and goes back to the top to look up the next (at least for the list implementation).
Clojure's (and Java's) sorted/tree map/set implementations are O(log2 n), so PCollections is likely about as fast for those two collections. If someone does performance testing to verify these theories, please let me know so I can link to it here. I plan to come back to this with more tests and data later...