Support uncompressed or fully in-memory vocabularies #1740

joka921 · 2025-01-31T08:41:08Z

So far, QLever's vocabulary (the set of distinct IRIs and literals) is compressed (on the level of individual IRIs and literals) and resides on disk by default. It can be configured that certain IRIs or literals reside in RAM. However, there was some overhead for determining whether an IRI or literal resides in memory or on disk. With this change, a free choice between the following four kinds of vocabularies:

On disk and compressed (the default so far)
On disk and uncompressed (costs more space but saves time for decompression)
Fully in memory and compressed (faster than disk but memory is expensive)
Fully in memory and uncompressed (maximally fast but most memory hungry)

Performance tests for exporting a TSV with 100M triples and a total size of 8.8 GB yielded the following result on a Ryzen 9 9950X with 4 x WD_BLACK SN850X NVMe RAID0 (column 2) and a Ryzen 5900X with 1 x TODO NVMe (column 3):

| on disk, compressed | 65s | 177s |
| on disk, uncompressed | 58s | 151s |
| in memory, compressed | 51s | 110s |
| in memory, uncompressed | 50s | 109s |

When measuring the performance multiple times in a row, so that the relevant parts of the disk-based vocabulary are cached by the operating system, the measurements from the first two rows decrease slightly (this effect is more pronounced when exporting smaller results).

BOTTOM LINE: An in-memory vocabulary is 40 - 60% faster for a single SSD, but only 5 - 15% faster for four parallel SSDs. Using compression is around 10% slower on disk and comes at almost no cost in memory.

Signed-off-by: Johannes Kalmbach <[email protected]>

codecov · 2025-01-31T09:19:19Z

Codecov Report

Attention: Patch coverage is 99.43182% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.03%. Comparing base (098b79c) to head (d8080b3).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/index/Vocabulary.cpp	90.90%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1740      +/-   ##
==========================================
+ Coverage   90.01%   90.03%   +0.01%     
==========================================
  Files         395      398       +3     
  Lines       37793    37954     +161     
  Branches     4257     4262       +5     
==========================================
+ Hits        34019    34171     +152     
- Misses       2478     2482       +4     
- Partials     1296     1301       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TODO: Make the vocabulary implementation be choosable from CMake Signed-off-by: Johannes Kalmbach <[email protected]>

Signed-off-by: Johannes Kalmbach <[email protected]>

…abularies

Signed-off-by: Johannes Kalmbach <[email protected]>

…to allow-different-vocabularies

Signed-off-by: Johannes Kalmbach <[email protected]>

hannahbast

Quick 1-1 with Johannes, this looks great and it works!

Signed-off-by: Johannes Kalmbach <[email protected]>

sparql-conformance · 2025-02-06T16:57:32Z

Conformance check passed ✅

No test result changes.

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=d8080b30f9914a89e3ed3dcda9c9ccf85a880795&prev=a307781592842f82bdfef78fc47fc832fd37368d

sonarqubecloud · 2025-02-06T18:28:50Z

Quality Gate failed

Failed conditions
E Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Make the In-Memory-Vocabulary compatible with the RDFVocabulary

e9e8dfd

Signed-off-by: Johannes Kalmbach <[email protected]>

joka921 added 5 commits January 31, 2025 11:22

Refactor things.

79a11b6

TODO: Make the vocabulary implementation be choosable from CMake Signed-off-by: Johannes Kalmbach <[email protected]>

Making the vocab configuration configurable at runtime.

e406fa4

Signed-off-by: Johannes Kalmbach <[email protected]>

An intermediate commit before switching branches.

49445e5

Signed-off-by: Johannes Kalmbach <[email protected]>

This seems to work, but the IDE has crashed, so we just restart:)

6d11c3b

Signed-off-by: Johannes Kalmbach <[email protected]>

Several refactorings.

3e7f494

Signed-off-by: Johannes Kalmbach <[email protected]>

joka921 marked this pull request as ready for review February 5, 2025 17:54

joka921 and others added 5 commits February 5, 2025 19:09

Some additional fixes and comments.

825f8bf

Signed-off-by: Johannes Kalmbach <[email protected]>

Merge remote-tracking branch 'origin/master' into allow-different-voc…

d0465da

…abularies

Refactoring there and back again.

066ddf6

Signed-off-by: Johannes Kalmbach <[email protected]>

Merge remote-tracking branch 'origin/allow-different-vocabularies' in…

3a4e223

…to allow-different-vocabularies

Fix compilation.

b9948ff

Signed-off-by: Johannes Kalmbach <[email protected]>

hannahbast reviewed Feb 5, 2025

View reviewed changes

hannahbast changed the title ~~Also support uncompressed or in Memory vocabularies~~ Also support uncompressed or in-memory vocabularies Feb 5, 2025

hannahbast changed the title ~~Also support uncompressed or in-memory vocabularies~~ Support uncompressed or fully in-memory vocabularies Feb 6, 2025

joka921 added 3 commits February 6, 2025 12:15

Feed this to the tools...

b1b884e

Signed-off-by: Johannes Kalmbach <[email protected]>

Fix for MacOS...

5f2ec6c

Signed-off-by: Johannes Kalmbach <[email protected]>

Many more improvements for the tests and for the tools.

d8080b3

Signed-off-by: Johannes Kalmbach <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support uncompressed or fully in-memory vocabularies #1740

Support uncompressed or fully in-memory vocabularies #1740

joka921 commented Jan 31, 2025 •

edited by hannahbast

Loading

codecov bot commented Jan 31, 2025 •

edited

Loading

hannahbast left a comment

sparql-conformance bot commented Feb 6, 2025

sonarqubecloud bot commented Feb 6, 2025

Support uncompressed or fully in-memory vocabularies #1740

Are you sure you want to change the base?

Support uncompressed or fully in-memory vocabularies #1740

Conversation

joka921 commented Jan 31, 2025 • edited by hannahbast Loading

codecov bot commented Jan 31, 2025 • edited Loading

Codecov Report

hannahbast left a comment

Choose a reason for hiding this comment

sparql-conformance bot commented Feb 6, 2025

Conformance check passed ✅

sonarqubecloud bot commented Feb 6, 2025

Quality Gate failed

joka921 commented Jan 31, 2025 •

edited by hannahbast

Loading

codecov bot commented Jan 31, 2025 •

edited

Loading