-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Combinator: discard_until / drop_until #1594
Comments
Isn't this equivalent to discarding the output of let (s, _) = take_while(p)(s)?; |
I don't fully understand understand how. Lets say this is our input:
drop_until(tag("HELLO"))(input) returns: OK(("lkasjdLLadO","HELLO")) I suppose you could use map(
pair(
take_while(not(tag("HELLO")),
tag("HELLO")
),
|(_, v)| v
)(input) but is that better? Maybe... though it seems like this matches HELLO twice because of the |
I see that Maybe defining your format in terms of the complement of something is more typical when parsing binary file formats, or when extracting something that is embedded within what is otherwise considered junk, e.g. codes inside Markdown, CSV, or such, entirely skipping the embedding format. For specifying language grammars, it makes more sense to positively define the thing you're skipping (comments, whitespace, etc.) even if you're just going to discard it. It was with this frame of mind that I assessed the usefulness of |
Note that this somewhat parallels the conversation in #1223 / #1566 regarding how much should |
Yeah, I can see that. I think I'd address this less by the opportunity to tweak performance (as it seems like if your parser isn't allocating, but just returning a slice of the input, there's no performance hit) and more by an appeal to proving a gentle introduction to Nom. Many users come to nom as a means to replace Regex (fully or in part) as Regex can quickly become unmaintainable as the complexity rises. Generally, regex was never a serious consideration when parcing language grammars, for example. Conceptually, regex is often used to match to some embedded pattern of tokens in a lager context. A way to pull desired information from an otherwise noisy document. Having a few combinators that are a 1-1 match to this domain makes the first tepid steps into Nom so much easier for those specific users to take. What I don't know is just how common this case really is. My intuition is that there are a lot of developers who are familiar with regex who may just want to toy with Nom for curiosity's sake. If that's true, they're likely going to try...
|
With clap, a common problem I find is the larger the API is, the more likely people are to not find functionality they need. I feel like nom is on the cusp of that and would hope that nom limits the convenience variants of parsers to help new users with nom.
For some reason I don't see how this helps with aligning with regex. Maybe enumerating regex features and how you feel they line up with existing or potential parsers would help. That could also be a useful piece of documentation for nom. |
Accommodates Rails's default tagged logger. Nom doesn't support dropping the input[^1], resulting in a useless vector being allocated for the duration of the parse. [^1]: rust-bakery/nom#1594
Accommodates Rails's default tagged logger. Nom doesn't support just dropping the garbage[^1], it's accumulated in a newly allocated vector. [^1]: rust-bakery/nom#1594
Accommodates Rails's default tagged logger. Nom doesn't support just dropping the garbage[^1], it's accumulated in a newly allocated vector. [^1]: rust-bakery/nom#1594
I'm interesting in seeing this implemented. I think there's a gap here, as mentioned by @epage - many of the combinators in |
When solving this years Advent of Code I decided to use So I ended up using a terrible solution - iterate over all consecutive substrings one char at a time (parsers were non-overlapping). It worked, but generally it wouldn't work and (obviously) it's not optimal because parsed bytes should be skipped. |
I mean I could and tried to use |
Funny I stumbled accross your comment for the exact same "problem" ;-) #[derive(Debug, PartialEq)]
struct Operands {
a: u32,
b: u32,
}
fn decimal(input: &str) -> IResult<&str, &str> {
recognize(many1(terminated(one_of("0123456789"), many0(char('_'))))).parse(input)
}
fn parse_operand(input: &str) -> IResult<&str, (&str, &str)> {
separated_pair(decimal, tag(","), decimal)(input)
}
fn parse_mul(input: &str) -> IResult<&str, Vec<Operands>> {
map(many1(
map(many_till(
take(1_usize),
delimited(tag("mul("), parse_operand, tag(")"))
),|(_, result)| {
result
})
), |operands| {
operands.iter().map(|oper| {
let (left, right) = *oper;
let a = left.parse::<u32>().unwrap();
let b = right.parse::<u32>().unwrap();
Operands{ a, b }
}).collect()
})(input)
} Feel free to use. MIT licensed |
This is a combinator I use all the time, might be useful to see something like it in this crate.
It drops a byte at a time until the given parser matches, then returns the result.
I don't do parsing in any really performance sensative contexts, this can probably be better implemented. This impl demonstrates the idea.
The text was updated successfully, but these errors were encountered: