Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add miden-package crate with Package type to represent a compiled Miden program/library. #1544

Open
wants to merge 29 commits into
base: next
Choose a base branch
from

Conversation

greenhat
Copy link
Contributor

@greenhat greenhat commented Oct 23, 2024

This PR adds miden-package crate with Package type ported from the compiler's ad-hoc implementation to represent a compiled Miden program/library.

Please consider reviewing this PR commit-by-commit. I crafted the commits to be focused on a single change and show the evolution of the implementation.

The reasons for the new crate are explained in 0xPolygonMiden/compiler#376

The compiler's PR that uses miden-package from this PR is 0xPolygonMiden/compiler#349

@greenhat greenhat changed the title Ass miden-package crate with Package type to represent a compiled Miden program/library. Add miden-package crate with Package type to represent a compiled Miden program/library. Oct 23, 2024
@greenhat greenhat force-pushed the greenhat/miden-package branch 4 times, most recently from 76a4b7e to 9ff342e Compare October 30, 2024 09:44
@greenhat greenhat force-pushed the greenhat/miden-package branch from 9ff342e to 7c9b95b Compare October 30, 2024 10:14
@greenhat greenhat changed the base branch from next to greenhat/library-deser-uncheck-exports October 30, 2024 10:15
@greenhat greenhat force-pushed the greenhat/library-deser-uncheck-exports branch from 1ca8581 to 3581727 Compare November 4, 2024 17:37
Base automatically changed from greenhat/library-deser-uncheck-exports to next November 4, 2024 21:37
@greenhat greenhat force-pushed the greenhat/miden-package branch 7 times, most recently from 41b5f5d to c5ff6f0 Compare November 12, 2024 14:07
@greenhat greenhat force-pushed the greenhat/miden-package branch 6 times, most recently from 211f34c to 84ea102 Compare November 19, 2024 08:49
@greenhat greenhat changed the base branch from next to greenhat/i1547-mastforest-add-advicemap November 19, 2024 08:49
@greenhat greenhat changed the title Add miden-package crate with Package type to represent a compiled Miden program/library. [2/2] Add miden-package crate with Package type to represent a compiled Miden program/library. Nov 19, 2024
@greenhat greenhat force-pushed the greenhat/i1547-mastforest-add-advicemap branch from eda4a01 to 006e03a Compare November 19, 2024 14:40
@greenhat greenhat force-pushed the greenhat/miden-package branch 2 times, most recently from 5ec902e to d6419ca Compare November 19, 2024 15:00
@greenhat greenhat force-pushed the greenhat/i1547-mastforest-add-advicemap branch from 006e03a to 92f3309 Compare November 20, 2024 07:05
Comment on lines 346 to 347
let proc_name =
ProcedureName::new_unchecked(Ident::new_unchecked(Span::unknown(Arc::from(str))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would allow deserialization of arbitrary strings as procedure names, right? (e.g., strings with control or unprintable characters etc.) Would this not be a problem?

Should we enforce at least some kind of validation (e.g., maybe against the alphabet used on line 371 below)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Since it's an unchecked user input, I suppose there is a chance to use it for some sort of attack. I cannot come up with any, but I'd feel safer if we have at least some validation here. This should be a different (more relaxed) validation from the one we use when parsing the MASM source code. @bitwalker What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I relaxed the validation rules in the Ident (used in LibraryPath) and ProcedureName in new constructors and used them in the deserialization instead of the new_unchecked ones.
The validation on MASM source code parsing is implemented in the lexer and does not use the checked constructors (it calls new_unchecked). So the existing checked constructors are the different (from lexer's) kind of validation to be used everywhere else.

Comment on lines 512 to 525
Ok((ns, rest)) => {
let module_id = Ident::new_unchecked(Span::new(
SourceSpan::default(),
Arc::from(rest.to_string()),
));
LibraryPath::new_from_components(ns, [module_id])
},
Err(_) => {
let module_id = Ident::new_unchecked(Span::new(
SourceSpan::default(),
Arc::from(path.to_string()),
));
LibraryPath::new_from_components(LibraryNamespace::Anon, [module_id])
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above. I think we discussed this previously, but I don't remember why we are skipping validation rather than updating validation rules to support the newly required formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the reason was that validation that we have now is for the MASM source code parsing, and technically, we can allow any characters in the identifiers in the compiled code. @bitwalker correct me if I'm wrong.
But from the "validate any user input" point of view, it makes sense to create and use another, more relaxed type of validation here and in ProcedureName (see my comment above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See my comment above - #1544 (comment)

package/README.md Outdated Show resolved Hide resolved
greenhat and others added 8 commits January 16, 2025 12:53
Co-authored-by: Philippe Laferrière <[email protected]>
Co-authored-by: Philippe Laferrière <[email protected]>
Co-authored-by: Philippe Laferrière <[email protected]>
Co-authored-by: Philippe Laferrière <[email protected]>
Co-authored-by: Philippe Laferrière <[email protected]>
Co-authored-by: Philippe Laferrière <[email protected]>
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline. The main ones are about:

  • Potentially adding some validation to library path deserialization.
  • Using Serializable/Deserializable instead of serde.
  • Not including functionality that assumes downstream crates (i.e., miden-base).

package/Cargo.toml Outdated Show resolved Hide resolved
package/Cargo.toml Outdated Show resolved Hide resolved
package/README.md Show resolved Hide resolved
@@ -0,0 +1,31 @@
[package]
name = "miden-package"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should make it a bit more specific. Maybe miden-vm-package or miden-code-package (though, not sure if I like these better than just miden-package).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got so much used to miden-package so I'm biased here. :) In general, if we plan to have another "packages" (account?) giving it a more specific name makes sense.

package/src/dep/mod.rs Outdated Show resolved Hide resolved
package/src/dep/mod.rs Outdated Show resolved Hide resolved
@greenhat
Copy link
Contributor Author

@bobbinth @plafer Thank you for the review! I've addressed all notes, implemented easy ones, and I am converting this PR to draft to work on:

  • Switch PackageExport::name to have QualifiedProcedureName type and propagate Arbitrary implementation for proptest;
  • Remove bitcode and serde based serialization in favor of Serializable/Deserializable;
  • New validation for ids (LibraryPath, ProcedureName) on deserialization;

@greenhat greenhat marked this pull request as draft January 17, 2025 11:17
Implement `Serializable/Deserializable` and `Arbitrary`(proptest)
for `QualifiedProcedureName`. Add serialization roundtrip tests for
`LibraryPath` and `ProcedureName`.
@greenhat greenhat force-pushed the greenhat/miden-package branch from 915c093 to a93f3a2 Compare January 20, 2025 16:07
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some more comments/questions inline.

Besides these, I think the main thing remaining here is potentially adding some validation to library path deserialization - right?

Comment on lines +16 to +25
pub use self::{
dep::{
resolver::{
DependencyResolver, LocalResolvedDependency, MemDependencyResolverByDigest,
ResolvedDependency,
},
Dependency, DependencyName,
},
package::{MastArtifact, Package, PackageExport, PackageManifest},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: should we re-export such things as Digest, Program, MastForest etc. from here is that this crate is more "self-contained"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "yes". I can imagine miden-package crate being used for some sort of package analysis without relaying on the rest of VM crates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 116 to 146
///// Parses a package from the provided bytes
//pub fn read_from_bytes<B>(bytes: B) -> Result<Self, Report>
//where
// B: AsRef<[u8]>,
//{
// use alloc::borrow::Cow;
//
// let bytes = bytes.as_ref();
//
// let bytes = bytes
// .strip_prefix(Self::MAGIC)
// .ok_or_else(|| Report::msg("invalid package: missing header"))?;
// let bytes = bytes.strip_prefix(Self::FORMAT_VERSION).ok_or_else(|| {
// Report::msg(format!(
// "invalid package: incorrect version, expected '1.0', got '{}'",
// bytes.get(0..4).map(String::from_utf8_lossy).unwrap_or(Cow::Borrowed("")),
// ))
// })?;
//
// bitcode::deserialize(bytes).map_err(Report::msg)
//}
//
///// Serializes the package into a byte array
//pub fn write_to_bytes(&self) -> Result<Vec<u8>, Report> {
// let mut bytes = Vec::new();
// bytes.extend_from_slice(Self::MAGIC);
// bytes.extend_from_slice(Self::FORMAT_VERSION);
// let mut data = bitcode::serialize(self).map_err(Report::msg)?;
// bytes.append(&mut data);
// Ok(bytes)
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be commented-out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed it. Thanks for spotting this!
In my defense, I planned to give this PR a "polish" before marking it ready and asking for another round of reviews. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 148 to 150
pub fn digest(&self) -> Digest {
self.mast.digest()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's add a doc comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be moved under src/package/serialization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file be moved to src/package/mod.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 102 to 110
pub struct Package {
/// Name of the package
pub name: String,
/// The MAST artifact ([Program] or [Library]) of the package
pub mast: MastArtifact,
/// The package manifest, containing the set of exported procedures and their signatures,
/// if known.
pub manifest: PackageManifest,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not for this PR, but where would we put custom metadata? Here, I'm thinking about the storage layout metadata we'd need to instantiate account components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking, either directly in Package or in PackageManifest.

@greenhat greenhat force-pushed the greenhat/miden-package branch from 223c95f to 37fe048 Compare January 22, 2025 05:17
@greenhat
Copy link
Contributor Author

@bobbinth @plafer I implemented everything and addressed all your notes. Please do another round.

Copy link
Contributor

@plafer plafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants