Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi dimensional arrays? #7

Open
jw3126 opened this issue Dec 3, 2021 · 4 comments
Open

Multi dimensional arrays? #7

jw3126 opened this issue Dec 3, 2021 · 4 comments

Comments

@jw3126
Copy link

jw3126 commented Dec 3, 2021

First of all, thanks a lot for creating this package. I think it is awesome to have a less magical alternative to BSON.jl.
The following snippet gives an error:

import LightBSON
arr = rand(2,2)
path = "hello.bson"
LightBSON.bson_write(path, arr)
LightBSON.bson_read(typeof(arr), path)
MethodError: no method matching Matrix{Float64}()

I think it would awesome if load+save of Base.Array would conveniently work out of the box.

  • Is that in scope of the package?
  • If so would it be hard to implement?
@ancapdev
Copy link
Owner

ancapdev commented Dec 3, 2021

BSON itself doesn't have a concept of multi-dimensional arrays, so that's why I haven't added it. It would require encoding some metadata about the array, and then it's no longer plain BSON, which is what this package is targeted at.

BSON array support is in general not very strong (they're actually encoded as objects with numeric keys for each element, see spec). What I've used myself for array data is zfp compression via ZfpCompression.jl stored as BSON binary objects. This takes care of dimensionality and type encoding (for the supported numeric types), along with compression if data has some correlation to it. What might be useful would be if we could standardize on some BSON binary subtypes across the ecosystem.

@jw3126
Copy link
Author

jw3126 commented Dec 3, 2021

Ok thanks a lot for the clarifications. These are reasonable design decisions. Thanks also for the ZfpCompression.jl recommendation. In my case, arrays contain structs that can again contain other arrays. So I think ZfpCompression is not quite applicable.

What might be useful would be if we could standardize on some BSON binary subtypes across the ecosystem.

What does this mean? A couple of conventions to serialize say array as (size=size(arr), data=vec(arr))?

@ancapdev
Copy link
Owner

ancapdev commented Dec 4, 2021

What might be useful would be if we could standardize on some BSON binary subtypes across the ecosystem.

What does this mean? A couple of conventions to serialize say array as (size=size(arr), data=vec(arr))?

Binary values in BSON have an associated subtype tag, some are defined in the spec, some are reserved, and some are open for user extension. There's been some discussion on MongoDB support tickets and other forums about adding new spec defined subtypes as a means to support e.g., typed arrays. Doesn't sound like that would be useful for your use case, but in general it would make storing large arrays of primitive types much more efficient.

@jw3126
Copy link
Author

jw3126 commented Dec 4, 2021

Ok I see thanks @ancapdev !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants