-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient payload encoding #289
Comments
I seem to recall it base64 was chosen regardless of the string type partly because it helps avoid problems of deserialization attacks and probably also to prevent having to escape stuff like this. However, I do wonder if it would help to specify an encoding that is I seem to remember we may have done some experimentation on this internally? |
Side note: does this belong in https://github.com/secure-systems-lab/dsse, apart from any changes to in-toto's media type as a consequence of a DSSE change? |
@adityasaky: if we are changing the base64 encoding, then it is more of a DSSE work; if we are not changing that, but compressing serialized statements, then I think it is an in-toto work. |
I suspect
|
So generally folks are open to some solution here. We'd probably be looking for a PR that defines whatever the proposal is along with some code that actually does it. |
This issue should be closed in favor of secure-systems-lab/dsse#63. All changes need to happen there, since this is a DSSE issue. |
Also mentioned in secure-systems-lab/dsse#63: As another alternative, the bundle format selected by in-toto, JSON lines, also offers a compression mode. If we were to compress the bundle, does that make it again an in-toto issue? :P Maybe compressing the bundle would yield a better data reduction, as a bundle will likely contain multiple attestations, and if these attestations are for the same set of artifacts, the "subject" fields will be repeated multiple times. A bundle-level compression would be able to discover the redundancy, something a statement-level compression cannot achieve. |
The CoRIM draft is proposing a COSE_Sign1 envelope around a CBOR-serialized object for compact representation. Avoid JSON altogether. It seems like they are really similar to in-toto, just that they have some notions of predefined predicates and limited extensibility... I'm still trying to suss out where we can remove redundancy across the two efforts. |
Related #361 |
Discussed at today's attestation maintainers meeting. We're open to both of these things. Our main concern would be on interoperability. Having multiple ways to encode and represent attestations could significantly hinder adoption. One way to resolve this might be with a 'generic' converter that could convert newer encodings to a canonical JSON encoding as needed. We'd be happy to review PRs if folks who are highly motivated here want to submit them. |
The in-toto specs use DSSE to contain the statement data and carry signatures.
The base64 encoding step incurs a 33% overhead, which really has no benefit, because the JSON serialized statement is already a legit string. This overhead will result in adoption difficulties for resource-constrained CI/CD, and is generally wasteful (e.g. some complex artifacts may generate 1GB provenance data; and 33% of that is 330MB, to store and transfer for every build).
Could we have some more efficient solutions?
For example, updating DSSE spec to not require all payload be base64 encoded -- the MIME type "application/vnd.in-toto+json" would indicate the payload is in text format and can be directly consumed.
Alternatively, if we must use base64 for DSSE payload, could we introduce something like "application/vnd.in-toto+json+lzma" which compresses the serialized statement first?
The text was updated successfully, but these errors were encountered: