Skip to content

UncleJim vs. PCollections

Glen K. Peterson edited this page Oct 13, 2015 · 19 revisions

PCollections competes only with UncleJim's first bullet point: immutable collections, so I'm only going to talk about that in this document.

Clojure's vector (list) and hashMap/hashSet have O(log32 n) performance, which theoretically scales better than the O(log2 n) binary tree structures it looks like PCollections uses.

Graph of Log base 32 (red) vs. Log base 2 (blue)

This graph shows how many operations each lookup requires (y) for a given number of items in the collection (x). The red line is the fast log32, the blue is the slower log2. Daniel Spiewak explains all the ramifications of this better than I ever could: https://www.youtube.com/watch?v=pNhBQJN44YQ

The Clojure collections also walk the sibling nodes in the internal data trees of these structures to provide iterators, which is pretty cool performance-wise. PCollections starts from the top of the tree doing an index lookup for each item, then increments the index and goes back to the top to look up the next (at least for the list implementation).

Clojure's (and Java's) sorted/tree map/set implementations are O(log2 n), so PCollections could theoretically be as fast or faster for those two collections. If someone does performance testing to verify these theories, please let me know so I can link to it here.