Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading RDB files #1

Closed
felixp8 opened this issue May 25, 2023 · 7 comments
Closed

Reading RDB files #1

felixp8 opened this issue May 25, 2023 · 7 comments

Comments

@felixp8
Copy link
Contributor

felixp8 commented May 25, 2023

From what I can tell there are two ways to read data from RDB files:

  1. Actually run a Redis server with that file as the database. The server needs to be started outside of Python (though ofc we can use like os.system() or something) but once it's running, we can query it through a Python package Redis provides. However, I can't get Redis running on my machine with their RDB file because I run out of RAM - it seems to need more than 12 GB. Presumably they are able to run it though, so that might not be an issue?
  2. Parse the raw RDB file directly. There is one Python package for this but it's no longer maintained and does not support the newest RDB file version (which is the file version we have). So, we'd probably have to make the necessary modifications for that ourselves, and maintain it as Redis continues to update its file format.

If this is a tool they want to use themselves, then probably best to go with option 1. In that case, best step might be to ask them for a smaller RDB file?

@CodyCBakerPhD
Copy link
Member

However, I can't get Redis running on my machine with their RDB file because I run out of RAM - it seems to need more than 12 GB. Presumably they are able to run it though, so that might not be an issue?

For better computational resources, check out the DANDI Hub. See if you can spin up the server there

However, #2 is greatly preferred if at all possible. That way we can ensure completely lazy access at all times

Wow, the repo has 700+ forks lol

Did you check out this PR by someone's fork that extended the version? sripathikrishnan/redis-rdb-tools#186

@CodyCBakerPhD
Copy link
Member

Also curious is https://pypi.org/project/rdb-cli/ allows you do extract every data stream and map it to JSON (either a pre-step to get a JSON version of the file, or if there are utility functions for parsing contents via an API?)

@felixp8
Copy link
Contributor Author

felixp8 commented May 25, 2023

Did you check out this PR by someone's fork that extended the version? sripathikrishnan/redis-rdb-tools#186

They only alter the version checking requirement but don't actually make any changes to to the parser. I tried using that fork and ran into an error with one of the data entries. Haven't had a chance to dig into that error though.

Also curious is https://pypi.org/project/rdb-cli/ allows you do extract every data stream and map it to JSON (either a pre-step to get a JSON version of the file, or if there are utility functions for parsing contents via an API?)

According to the package description, this is forked from redis-rdb-tools, but I can't find that fork itself - the description just links back to redis-rdb-tools. I assume it has the same functionality as redis-rdb-tools w.r.t. mapping to JSON, etc., but I'll dig through the source. If it works with the newer file version, we should be able to use its internals to avoid the JSON step.

@felixp8
Copy link
Contributor Author

felixp8 commented May 26, 2023

Quick update:

  1. DANDI Hub worked for running the Redis server with the RDB they provide, and I can access the data, etc.
  2. rdb-cli has implemented some of the changes needed to support RDB file version 10, so its in better shape than redis-rdb-tools, but it still can't open the file we received. Specifically, it's failing on a data type they're calling REDIS_RDB_TYPE_STREAM_LISTPACKS_2. Have to dig more deeply into the code to see what kind of work addressing these issues would entail.

@CodyCBakerPhD
Copy link
Member

CodyCBakerPhD commented May 26, 2023

OK, (1) can work then if we can confirm that is how they are used to running the system; can you check with them on that?

@felixp8
Copy link
Contributor Author

felixp8 commented May 26, 2023

Yeah sure! Could you (or Ben) loop me in on the email thread?

@felixp8
Copy link
Contributor Author

felixp8 commented May 29, 2023

Ok, so we'll move ahead with running a Redis server and querying through their official Python interface, since the Stavisky lab should have the server running (or can easily get it running) whenever they use our conversion scripts. Closing this for now...

@felixp8 felixp8 closed this as completed May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants