-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test using lmdb as the kv store #114
Conversation
@prajnoha this isn't ready for review (doesn't work) - but I thought you might want to have a glance to see if you think the approach has potential. I think it might be something we could work out. The database requires name value pairs, so we'd need to rework the KV_STORE_VALUE_REF uses. We'd need to switch to the values in continuous memory to be stored in the lmdb. The keys work fine, but I'm still working out storing/decoding the values. I think the child processes could make updates to the KV pairs and the parent could refresh to get the updates. I added a couple tests for the kv_store. Might be worth you having a look to make sure the tests make sense and test how you intended types F and E to work. The tests are failing because the Jenkins nodes don't have lmdb-devel installed. I thought I'd check to make sure you think this is a possibility before updating the test systems. |
Thanks, I'll have a look... |
I see... The I mean, the sequence here is this:
So the Right now, we don't make use of What we could simply do here is to state that we do not support The
Hmm, a few questions come to my mind here... Looking at the patch, I noticed you always allocate new key/value when returning it from Another thing I'm thinking about are the transactions. I assume the db is shared among processes so that the other process opens the database of the same name and when a transaction is started, the "snapshot" is created (somehow internally, using the mmap I suppose). But in the patch, we create a separate transaction for each If that's the case, then we probably need to change the Line 3568 in ae05502
Hope my comments make sense... I believe we can use lmdb as backend, we just need to find a good way of how to map the functionality we need (...getting close to what we have with the hash backend and fork for the snapshot and transaction with more that one set/get call inside...). I'm still thinking about this, so these are just my preliminary thoughts/questions related to this... We'll surely discuss this a little bit more.
Yup, the tests look to test the essence of those types...
Sure... |
(...while looking at the kv code, I realized we could do a minor cleanup here: #115 ) |
Yeah, here is a small snip of the lmdb documentation about returned memory. It looks like they'll release or reuse any returned memory when the transaction is completed.
|
I see. I hadn't considered we might want to support both the hash table and lmdb as possible back ends. For SID there is value having the prior boot device mapping available to the modules, right? In the case of the hash table back end, it wouldn't be available? Or would we also implement save to disk for the hash table? |
Thanks. Once you merge it, I'll rebase. |
...so as far as we're under a transaction, we should be OK to use directly the returned value without creating a copy. Now if we started the transaction at the very beginning of processing in the worker and end/commit the transaction at the end of the processing (...around the code in I wonder if it's OK with lmdb do so longer-running transactions, but I assume it is - all the transaction should be working on top of a snapshot in LMDB. But that's something we need to make sure... I haven't looked in more detail here whether it poses any restrictions and/or limitations for "main db" when the transaction/snapshot is active. Then supporting the vector values where we might append or remove items from existing vector would be another thing to check whether we can map it onto the LMDB somehow... This is actually the part I'm not yet sure how to map onto lmdb because with the hash, we use |
I'm considering the I mean, we can still have kv-store to support both hash and lmdb backend (or even another back end if we ever needed), just the hash wouldn't support the persistence/flushing to disk part. The lmdb-backed kv-store would be used for the main SID db because we want that persistence, while the hash-backed kv-store could still be used for quick lookups, caches etc in another parts of the code/modules, for temporary kv-stores... |
Yes - records are not written to the database until mdb_txn_commit().
Right. We can easily expand the scope of the transactions to include more updates.
Yes, I think I understand. Do you think we should add a kv_database.c and put the interface there? I've been attempting to restore the hashtable back end to the kv_store along with adding the lmdb and it quickly gets complicated. I'm thinking it will get more complicated when we start using the functionality of the database to sync key/value pairs across processes. Maybe I'm missing something? I can push my attempts to github on another branch if that is helpful. |
You mean creating an internal interface? For example like we have for buffers where there's either linear or vector buffer with internal interface on top and then calling specific functions based on type? That way we have the code in separate files to handle each type (...in case of kv-store, that would be separate file for each backend).
I see, it might get complicated a bit if we kept the code to handle all backends in one place, so maybe including that "internal interface" would help, I hope. When it comes to syncing key/value pairs across processes with the hash backend, we probably won't support the "transaction begin" and "commit" directly there, because that functionality (currently) depends on the fork (for creating the snapshot) and own syncing mechanism and this functionality is simply layered on top of Practically, it means that calling "transaction begin" and "commit" would return something like "not supported" ( But if we had another backend with direct support for transactions (like lmdb), we wouldn't need to bother about adding transactions to hash, we'd just use the lmdb backend instead. And the hash backend would stay there for another purpose - for creating simple look-aside db with kv pairs if needed. The advantage is that we're using the same interface for all the backends. Just certain more advanced functionality (like those transactions) don't necessarily need to be supported in each backend. But we could still store vectors, mark values either as copies or references, or merging values in a vector before storing, automatically freeing up the value if it is marked that way etc. with the With this, if we could find a way how to make use of the lmdb as backend, the code in |
@prajnoha I'm looking at writing a test for the 'kv-store' - Edit: I now see that the resource is only coupled with SID if the .init field is set to _init_sid. I'd like to have a test for the hashtable back end to the kv-store, so I can quickly test the lmdb code to make sure I haven't broken either interface. I'm having some troubles using the following... but still walking through the debugger to make sure a kvstore is initialized properly. Maybe I should add a _init_kvstore?
|
(Sorry, was away for the last 2 days.) With the snippet of code you pasted above, you'd be actually defining a completely new resource type ("class"). But what we need, is to create an instance ("object") of Lines 4434 to 4443 in f2f0456
Then you'd get the resource instance which you can pass to all the The 5th arg that is passed there ( sid/src/include/resource/kv-store.h Lines 57 to 62 in f2f0456
...so stating what backend the sid/src/include/resource/kv-store.h Lines 53 to 55 in f2f0456
...but we can add more params for other backends if needed. In case we created a new lmdb backend in addition to the existing hash backend, we would probably end up with:
(Of course the internals of |
(...and yes, then for testing, you only need to create the kv-store instance as a unit, we don't need all the other parts like the daemon itself, other resources which when glued together make up all the damon. This was actually one of the goals for "resources" - to have the code clearly separated so we can test it separately too... and, of course, also to be able to reuse such units in the code.) |
When it comes to kv-store use in SID daemon itself, there are 3 layers:
|
476c6a0
to
c798207
Compare
1037fca
to
a5cc368
Compare
I started to look at using a thread pool rather than forking the workers. Is that something you'd consider? I think you've said "no" to that, but I'm having a hard time finding your response. |
That is a bit inconvenient, but unless there's a tangible performance degradation, I think it's still feasible...
When it comes to ordering, there's one thing we already need to be counting in - the uevents are first processed in udev and it does that in its own processes where the udev rules are evaluated (where one of the rules is the Now, these udevd workers are running in parallel, BUT udevd already implies some restrictions:
Which means that for a single device, we will never receive reordered uevents - we need to finish processing the current one before starting to process another for the same device (...and Looking at SID and its records in db, we have 4 namespaces:
I've been trying to direct the access to db in a way (by restricting the API fns) so that we only add/remove/update records that are only related to a device for which we're just processing the event. That means, we shouldn't be allowed to change records which are completely unrelated. Whether a device is related to another device(s) is given by placing the devices in a group. Then on each uevent, we could only:
So what I mean to say is that instead of using locks for a db, I'm trying to direct (and restrict) the db usage so that it's controlled and tries to avoid conflicts from concurrent writes. And if there are cases where the conflicts are unavoidable, we still have a possibility to hook a module for a solution - we have the callback on db updates where we can possibly call out to a module to decide which record takes precedence or whether any other action should be taken (e.g. we get different metadata info from different PVs - like with the out of sync VG seqnos - then we can call out to LVM to resolve the conflict coming from the inconsistency before we actually store any records in SID db).
Yeah, that's a pain. I don't know either. Right now, we have three ways:
I think it was here, but very short: #33 (comment) Simply, besides making use of COW-based memory after fork for practically creating a snapshot for the in-memory db, it also seems to me much safer to have the module code running in a separate process. I think we'll have more control over resources used that way (e.g. we can very easily kill if we detect timeouts/long-running ones; set various limits etc...). Also, with threads, we'd need to be very careful when accessing the structures ("resources") as they would all be shared and it would be harder to confine the module code. I'm not saying threads are not a possibility, it just seems more complex to me, hence more prone to bugs... |
@prajnoha I think it would make sense for you to have a high level look at the code. The tests work - but it is a lot of change. Any feedback is welcome. Just want to make sure I'm headed in the right direction. |
Sure, I'll have a look. Thanks! |
I think we need to consider using longer-running transactions. Right now, we begin and end the lmdb transaction right in the Lines 3406 to 3421 in 449e21d
Now, it looks lmdb supports as many concurent readers as we need, as well as concurent readers and one writer. The tricky part is that while processing the phases, usually both the core and modules need to write/updated/remove records from db as results of scans and processing. That practically means we'd need to take a "write" transaction for the whole processing. Considering that we could have more than one worker running in parallel, there could certainly be more than one request for "write" transaction during a period of time - lmdb would serialize that, I suppose. The second and all the other requests would block/queue. So, if possible, we need to find out how to deal with this in way that lmdb use is still feasible for us. With the hash table, we're using a snapshot of the db because of the COW-based memory nature after forking a new worker. Then we allow additions/edits/removals in the snapshot and we're actually collecting changes from the snapshot along the way and at the end of processing, we send those changes over to main process to synchronize them with the main db. So that can be considered as a kind of a "write batch". When synchronizing the changes with the main db, we are able to do additional checks before we actually update the records (that is the With lmdb, we still need to find a way how to: This is something that I suppose lmdb doesn't support directly, so we need to find our way here on top of it. It might be that we would only take "read" transactions in the workers - we could still read the db in consistent way and we would allow for several snapshots in parallel (because "read-only" transactions/snapshots are not blocked). Then any changes would need to be written aside somehow (either a new db instance or anything else). But this feels a bit complicated - we'd have two databases in a worker, one with the frozen snapshot ("read-only" db transaction) and one with the "changes" - then if we wanted to look for a value while doing further processing in the worker, we'd need to look in the "changes" first and if not found there, in the "read-only" db. But it looks really fragile, maybe there's a better way... That said, I understand why lmdb (as well as other databases) do not allow more than one writer at a time. Thing is that our use case is narrowed down when it comes to database updates - as I described above, we have those 4 namespaces, where the same record in UDEV and DEVICE namespace will never be updated at the same time - hence we avoid conflicts from parallel writes. The MODULE namespace is in the hand of the module itself so if really needed, it can use its own locks (or we can provide an API for the module to do so). So we need to specialize the db for our needs here a bit. |
As for the code, just a few things I noticed:
|
Todd, in the meantime, before we sculpt The There's a downside that |
@prajnoha Ok. Thanks for letting me know. Do you think we could get away with a single back end for the kv-store for v1? We could put the lmdb branch on hold until we need it? lmdb adds a fair number of dependencies and isn't ideally suited to the fork/COW design. In the Caveats section of the lmdb docs it warns "Use an MDB_env* in the process which opened it, without fork()ing." http://www.lmdb.tech/doc/ I had though we might consider a multi-thread model, but I understand your decisions to avoid it. lmdb would be better suited if we went in that direction. I'm still concerned about how we are going to unwind the systemd call stack in the fork'd processes. Maybe we could discuss in the next meeting? My inability to figure out a clean solution was what inspired my thoughts about moving towards threads rather than processes. Are you thinking we will switch to a C++ compiler - but keep a lot of stuff in plain C? I looked at the docs/examples for tkrzw. I need to look more closely at the details for transactions/recover - but at first glance it looks like it provides a lot of what lmdb does, right? : https://dbmx.net/tkrzw/#tips_acid |
OK, we can park that for now and revisit later then...
Yes, this is something that troubles me a bit too. We can collect ideas and see what is possible. One of them being replacement of the systemd event loop, if there's a better alternative out there. Or even, our own simple event loop, but would be better if there's anything we could just reuse, of course...
You mean related to the possible use of
Well, yes, it's a simple key-value DB with support for a few backends on its own (hash, skip list, b-tree... either in-memory only or also backed by a file). I'm thinking about making use of the From the transactions, we'd make use the atomicity - so taking a transaction when we receive request to sync the db in main process, then applying all the changes/diffs, then committing the transaction. If anything goes wrong in the middle, we'd discard the transaction (...and probably restart the operation that caused the db changes/diffs). Currently, we don't have this kind of atomicity when syncing changes/diffs from snapshots with main db with just a simple hash. I've also noticed that |
No description provided.