Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endid support to re_strings. #464

Merged
merged 6 commits into from
May 8, 2024
Merged

Add endid support to re_strings. #464

merged 6 commits into from
May 8, 2024

Conversation

silentbicycle
Copy link
Collaborator

Change the re_strings API to optionally define an endid for each string.

  • This stores the set of endids in a state_set. The IDs are the same size and it doesn't seem worth defining a distinct ADT.
  • re_strings_add_raw and re_strings_add_str now take an extra argument, a pointer to an endid to set (if non-NULL).
  • Add two tests, one that will produce a trie of [abc]{3} and the other showing that duplicated strings will collect all their endids, not just one.

This adds an extra parameter to `re_strings_add_str` and
`re_strings_add_raw` that (if non-NULL) will associate a single endid
with the string being added. When `re_strings_build` constructs the
DFA it will produce a separate end state for each end.

This needs further testing with multiple overlapping patterns. When
multiple literal strings appear in the input only the latest match
will be reported.
This uses a `struct state_set` since sizeof(fsm_state) ==
sizeof(fsm_end_id_t), and it's probably not worth making a separate ADT
just for these.

The second test checks that duplicated strings get all their endids set.
The previous implementation (a single endid, or ENDID_NONE) dropped all
but the last endid defined.
@silentbicycle silentbicycle marked this pull request as ready for review April 29, 2024 15:50
@silentbicycle silentbicycle requested a review from katef April 29, 2024 16:01
src/libre/ac.c Outdated Show resolved Hide resolved
src/libre/ac.c Show resolved Hide resolved
@katef katef merged commit f945fda into main May 8, 2024
322 checks passed
@katef katef deleted the sv/re_strings-endids branch May 8, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants