Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema-full formats? #208

Open
colinator opened this issue Oct 8, 2024 · 6 comments
Open

Schema-full formats? #208

colinator opened this issue Oct 8, 2024 · 6 comments

Comments

@colinator
Copy link

colinator commented Oct 8, 2024

Would it be possible (and if so how difficult) to add 'schema-full' serialization formats? For instance protobuf, flatbuffers, cap'n proto, dds, etc.

Obviously it would require providing the schema, or maybe the compiled artifacts (as in protobuf for instance) at compile-time. And I realize it doesn't really map to this library. But something that could easily translate between, say, protobuf and flex buffers, or dds and json, would be really useful.

@liuzicheng1987
Copy link
Contributor

liuzicheng1987 commented Nov 5, 2024

I have taken a look at protobuf, flatbuffers and cap'n proto and concluded that there are issues with each of these formats that it make it difficult. I have even been in touch with Kenton Varda, the main author of cap'n proto, and he has confirmed that I would need significant amounts of reverse-engineering to make this work.

However, I think that Apache Avro (https://avro.apache.org/docs/1.12.0/) looks very promising and I will open an issue for it.

@tsurumi-yizhou
Copy link

The protobuf format isn't so complex. See struct_pb which implements a simple system to serialize into protobuf.

@liuzicheng1987
Copy link
Contributor

Hi @colinator and @tsurumi-yizhou,

so, I've got some updates on this issue. I have devised a system on how we can use the reflection API for schemaful formats and the first format I have chosen to support is Apache Avro (https://avro.apache.org), just because its C API lends itself to doing something like this (https://avro.apache.org/docs/1.11.1/api/c/).

The next one I have in mind is cap'n proto, which actually has an API specifically for this purpose (https://capnproto.org/cxx.html#dynamic-reflection).

But I will certainly take a look at the library @tsurumi-yizhou suggested and how they approach this problem. If we could also support protobuf, that would be great.

@colinator
Copy link
Author

This is good news - I've long wanted a c++ library that can serialize both to flex and flat buffers, for example - or any other format.

The neat thing about serialization formats such as flatbuffers (or cap'n proto) for example, is that they support "0-copy". Meaning, if your data contains a large thing (such as an image tensor), then no expensive memory allocation/copy is needed (at least at the user level) in order to read it. I'm concerned that reflect-cpp cannot support 0-copy, which negates one of the big benefits of some of the formats. See my issue #207. I'd be happy to chat more about this.

@liuzicheng1987
Copy link
Contributor

@colinator , it might be possible to support 0-copy operations for some of these formats.

So, instead of writing this:

struct Person{
    std::string first_name;
    std::string last_name;
};

You could write this:

struct Person{
    capnp::Text first_name;
    capnp::Text last_name;
};

But it is harder to see what we could do about vectors...at the end of the day it is the philosophy of reflect-cpp to closely integrate with the C++ standard library.

@liuzicheng1987
Copy link
Contributor

That being said, protobuf actually has a reflection API that looks very promising:

https://protobuf.dev/reference/cpp/api-docs/google.protobuf.message/

And it seems we can also support flatbuffers if we are able to reverse-engineer the algorithm they use to calculate the offsets (which shouldn't be hard):

https://dbaileychess.github.io/flatbuffers/flatbuffers_internals.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants