Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge InputIter, InputTakeAtPosition, InputLength and InputTake #1612

Merged
merged 5 commits into from
Jan 7, 2023

Conversation

Geal
Copy link
Collaborator

@Geal Geal commented Jan 5, 2023

The new Input trait assembles methods from these 4 traits

@coveralls
Copy link

coveralls commented Jan 5, 2023

Pull Request Test Coverage Report for Build 3859246340

  • 199 of 265 (75.09%) changed or added relevant lines in 11 files are covered.
  • 3 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.8%) to 62.47%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/bits/mod.rs 2 3 66.67%
src/character/streaming.rs 14 15 93.33%
src/combinator/mod.rs 3 4 75.0%
src/character/complete.rs 16 20 80.0%
src/number/complete.rs 6 11 54.55%
src/bytes/streaming.rs 22 28 78.57%
src/number/streaming.rs 5 11 45.45%
src/bytes/complete.rs 29 36 80.56%
src/traits.rs 100 135 74.07%
Files with Coverage Reduction New Missed Lines %
src/character/complete.rs 1 81.12%
src/number/complete.rs 1 57.5%
src/number/streaming.rs 1 55.6%
Totals Coverage Status
Change from base Build 3849317942: 0.8%
Covered Lines: 1558
Relevant Lines: 2494

💛 - Coveralls

@@ -15,6 +15,208 @@ use crate::lib::std::string::String;
#[cfg(feature = "alloc")]
use crate::lib::std::vec::Vec;

/// Parser input types must implement this trait
pub trait Input: Clone + Sized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand what are the needs of custom Input authors?

Before, every parser declared the capabilities that they needed, theoretically allowing for a wide variety of Inputs. In practice, I'm assuming only slices of tokens are practically supported and that that is what this change represents.

I guess the other advantage of the old approach was that you could add new capabilities without a breaking change. Now, if we need a new function, there either has to be a safe inherent implementation or we create a weird offshoot trait until the next breaking release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's as you said. Having the multiple traits allowed adding more features without too much impact, at the cost of more complex type signatures. Now they have not moved much for some time, and there are some redundant parts, like InputTake VS Slice, so I am merging them together. An existing input type implementation would mainly have to copy old method implementations to the new trait and add the new take_from method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else this PR doesn't acknowledge that would be good to document is what traits were not included (e.g. Offset) and why, or why InputLength still exists despite being merged

/// The current input type is a sequence of that `Item` type.
///
/// Example: `u8` for `&[u8]` or `char` for `&str`
type Item;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo as a user, calling this Token would make the intent clearer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that would be the right term. The input type produces raw data, tokens is what would be parsed from this data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the level the parser exists. Some parsers are producing tokens, some are consuming tokens to produce an AST.

Looking around, I see combine calls it Token. I'm not seeing an equivalent trait in chumsky yet.

Geal added 3 commits January 6, 2023 22:51
the Input trait now has everything needed with the take, take_from and
take_split methods
@Geal Geal force-pushed the merge-input-traits branch from 71bd90c to dcb252b Compare January 6, 2023 23:21
@Geal Geal marked this pull request as ready for review January 6, 2023 23:46
@Geal Geal merged commit 6923bac into main Jan 7, 2023
@Geal Geal deleted the merge-input-traits branch January 7, 2023 17:15
}
}

impl<'a> Input for &'a [u8] {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this implementation be generalised for all &'a [T]? I can't see anything in the trait implementation that is u8 specific and this would close #1482

epage added a commit to epage/winnow that referenced this pull request Feb 1, 2023
This is modeled off of rust-bakery/nom#1612 with a mix of `combine::Stream.
- Chose `Token`, like `combine`, as that is the term I've been generally
  standardizing on and is clearer in intent than `Item`
- Baking in `Slice` support as that is important for wrapper types.  I chose
  `Slice` over `combine`s `Range` as this is more focused on Rust integration,
  rather than grammar integration, and we are generally using `slice`.
- `next_token` is really what some of nom's existing traits are trying to do
  but awkwardly. `combine` does this through `StreamOnce::uncons`
- `iter_offsets` instead of `iter_tokens` as the only time you should care
  about the tokens, you should be looking at the offsets unless using something
  like `AsBytes`.
- `next_slice` is a parallel to `next_token` but the return type differs as the
  lookup is done in a different function call.  I'm tempted to add an
  `unchecked` variant but I want to benchmark first.

I considered following `combine` and doing separate `InputToken` and
`InputSlice` traits but figured we'd see how this goes first.
epage added a commit to epage/winnow that referenced this pull request Feb 1, 2023
This is modeled off of rust-bakery/nom#1612 with a mix of `combine::Stream.
- Chose `Token`, like `combine`, as that is the term I've been generally
  standardizing on and is clearer in intent than `Item`
- Baking in `Slice` support as that is important for wrapper types.  I
  chose `Slice` over `combine`s `Range` as this is more focused on Rust
  integration, rather than grammar integration, and we are generally using
  `slice`.
- `next_token` is really what some of nom's existing traits are trying to do
  but awkwardly. `combine` does this through `StreamOnce::uncons`
- `iter_offsets` instead of `iter_tokens` as the only time you should care
  about the tokens, you should be looking at the offsets unless using
  something like `AsBytes`.
- `next_slice` is a parallel to `next_token` but the return type differs
  as the lookup is done in a different function call.  I'm tempted to
  add an `unchecked` variant but I want to benchmark first.

I considered following `combine` and doing separate `InputToken` and
`InputSlice` traits but figured we'd see how this goes first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants