Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDB file size is 3-5 times original XML! #1

Open
duckAsteroid opened this issue Oct 31, 2013 · 2 comments
Open

CDB file size is 3-5 times original XML! #1

duckAsteroid opened this issue Oct 31, 2013 · 2 comments
Assignees

Comments

@duckAsteroid
Copy link
Owner

The main culprit is that the full XML path keys are massive and repeat a lot!

  • could ZIP results?
  • could keep a table of path to long IDs and then use the long as key (would mean two phase lookup)
@duckAsteroid
Copy link
Owner Author

ZIP works nicely (because the keys repeat) and gets the size back down close to the original XML.

However, the ZIP would need unpacking for read access (since we are using a random access file).

@ghost ghost assigned duckAsteroid Oct 31, 2013
@duckAsteroid
Copy link
Owner Author

Storing the path of the keys separately with an id for each would save space...

As an example we could store /world[0]/continent[0]/country[0]/city[0]=1234; then 1234@id'=1234', the downside is that parent/child traversals would require an additional lookup for the key ID (e.g. what is the ID of /world[0]/continent[0]/country[0]/city[0]/name[0]?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant