-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support uncompressed or fully in-memory vocabularies #1740
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Johannes Kalmbach <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1740 +/- ##
==========================================
+ Coverage 90.01% 90.03% +0.01%
==========================================
Files 395 398 +3
Lines 37793 37954 +161
Branches 4257 4262 +5
==========================================
+ Hits 34019 34171 +152
- Misses 2478 2482 +4
- Partials 1296 1301 +5 ☔ View full report in Codecov by Sentry. |
TODO: Make the vocabulary implementation be choosable from CMake Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
…to allow-different-vocabularies
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick 1-1 with Johannes, this looks great and it works!
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Conformance check passed ✅No test result changes. |
|
So far, QLever's vocabulary (the set of distinct IRIs and literals) is compressed (on the level of individual IRIs and literals) and resides on disk by default. It can be configured that certain IRIs or literals reside in RAM. However, there was some overhead for determining whether an IRI or literal resides in memory or on disk. With this change, a free choice between the following four kinds of vocabularies:
Performance tests for exporting a TSV with 100M triples and a total size of 8.8 GB yielded the following result on a Ryzen 9 9950X with 4 x WD_BLACK SN850X NVMe RAID0 (column 2) and a Ryzen 5900X with 1 x TODO NVMe (column 3):
| on disk, compressed | 65s | 177s |
| on disk, uncompressed | 58s | 151s |
| in memory, compressed | 51s | 110s |
| in memory, uncompressed | 50s | 109s |
When measuring the performance multiple times in a row, so that the relevant parts of the disk-based vocabulary are cached by the operating system, the measurements from the first two rows decrease slightly (this effect is more pronounced when exporting smaller results).
BOTTOM LINE: An in-memory vocabulary is 40 - 60% faster for a single SSD, but only 5 - 15% faster for four parallel SSDs. Using compression is around 10% slower on disk and comes at almost no cost in memory.