-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move character lists data before the byte code in a pattern #540
Conversation
Btw the original problem was that repeating character classes can break alignment rules unfortunately. It would not be bad if |
d29e3ec
to
8faa069
Compare
I tried various variations for this code, but the engine always have a feature which blocked them. In the end the character list data is stored before the byte code. The cost is an extra argument for xclass, and an extra member in the real code. Computing the byte code start is simplified. |
Context: a fuzzer got a comparing uninitialized memory error, when a subpattern is repeated. The reason is that space for alignemnt was not initialized. Then it turned out that repeating character lists may break alignment, so they need to be moved out from the byte code and stored elsewhere, and only a reference to them is stored in the byte code, which can be repeated. So this patch fixes a complicated and serious error in the new code. |
Interesting. I would have thought the simplest thing would simply be to avoid adding alignment padding entirely, and simply have READ8, READ16, READ32 macros to do the unaligned reads using bit-ops. Just let the uint32 values lie wherever they fall naturally in the bytecode, and read them in a safe way. Compilers are also super-good at optimising this pattern: |
It's extra complexity to move data out of the main sequence of bytecode. And it's extra complexity to introduce any weird requirements on the bytecode, that would prevent it being memmove'd around. The bytecode is a PCRE2_SPTR, so its contents should be read and used, simply assuming that it has the PCRE2_SPTR base alignment. Adding padding so that certain slices inside it have stricter alignment then means that the code can't be copied and moved, which is unfortunate. |
The bytecode header structure always needed at least 32 bit alignment, so you cannot move the byte code to any location. Unaligned accesses might be costly. |
src/pcre2_compile.c
Outdated
/* Char lists size is an even number, | ||
because all items are 16 or 32 bit values. */ | ||
/* Char lists size is an even number, because all items are 16 or 32 | ||
bit values. The character list data is always aligned 32 bytes. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bytes/bits/
src/pcre2_compile.c
Outdated
CU2BYTES((PCRE2_SIZE)cb.names_found * (PCRE2_SIZE)cb.name_entry_size); | ||
|
||
#if defined SUPPORT_WIDE_CHARS | ||
if (cb.char_lists_size > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really making a difference but I think cb.char_lists_size != 0
is clearer
8faa069
to
2ae56b2
Compare
I have updated the patch |
This will probably make a merge conflict with my PR #523, so it's a race to see which is merged first! |
@PhilipHazel will decide. I have no problem with doing the merge. |
src/pcre2_compile.c
Outdated
|
||
cb->char_lists_size += char_lists_size; | ||
|
||
memcpy((uint8_t*)cb->start_code - cb->char_lists_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, are the character lists laid out in backwards order? The first ones encountered in the pattern are just before the start of the code, and the character class data at the end of the pattern is placed at the start?
Or am I misunderstanding what this does? (It seems to be packing the character data into the same alloced buffer as the code, but stored in front of it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This way we can reuse the code start value (part of the match block) as the base (list end) value. I could reverse the order (start from max to 0), but I don't see any gain from it. This way the generator is a bit less complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Maybe a comment on the layout would help.
src/pcre2_compile.c
Outdated
|
||
cb->char_lists_size += char_lists_size; | ||
|
||
memcpy((uint8_t*)cb->start_code - cb->char_lists_size, | ||
(uint8_t*)(cranges + 1) + cranges->char_lists_start, | ||
char_lists_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could memset the padding bytes to 0xff or something. It appears you're just leaving whatever junk is in there from malloc?
If you do initialize... then you'll get predictable runtime behaviour (if something reads from it by mistake). If you don't initialize... then you'll get nice valgrind warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make tools happy. Ok, I will check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best-of-both would be a memset to 0xff and also an ifdef SUPPORT_VALGRIND ... VALGRIND_MAKE_MEM_NOACCESS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer doing only the VALGRIND_MAKE_MEM_NOACCESS, so any mistaken reads can be detected, after all those valgrind warnings are a feature!, the memset is just a workaround IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a ((uint16_t*)data)[-1] = 0xffff;
to set all bits. Btw valgrind only complains if you check the memory. Just copying it is not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check I understand - the data is 16-bit and/or 32-bit packed values, but we want it to be 32-bit aligned. So the data
variable is always 16-bit aligned (at least), and if there is padding, it will be exactly 2 bytes. So the cast (uint16_t*)data
is doing a correctly-aligned write of the two padding bytes.
That all looks great, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The character list data is a stream of 16 bit values followed by a stream of 32 bit values. Both stream can be empty. Regardless, the final data is considered to be a stream of 32 bit values, and there might be an unused 16 bit value in the beginning sometimes.
f80bd56
to
bf0e3e9
Compare
src/pcre2_compile.c
Outdated
char_buffer += 2; | ||
} | ||
cb->char_lists_size += 2; | ||
/* Make tools happy by setting memory data. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/memory data/memory padding data/
bf0e3e9
to
876589f
Compare
I changed the extra 16 bit setting to debug only, and added some asserts. |
a7592b4
to
ee034d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this look like it should work well
src/pcre2_compile.c
Outdated
@@ -10206,13 +10206,36 @@ if (length > MAX_PATTERN_SIZE) | |||
goto HAD_CB_ERROR; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block here could be in an "#else" block for the code you added.
Sorry, really minor nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow. I wanted to be in the safe side here, since MAX_PATTERN_SIZE - length >= 0
after the check and never underflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you want to avoid overflow. OK, that makes sense. Your latest version merges the two blocks, which looks tidy. Thanks!
This ensures aligned data store even when the range is repeated. Furthermore character lists are stored once regerdless of repeats.
ee034d2
to
0c67546
Compare
This patch moves the character lists to the end of the pattern, and only a reference is stored. This way repeating, repeat detection, and serializing works without modifications. Also, for repeating character sets inside brackets can save a lot of space. The disadvantage is that an extra variable is needed to be maintained, which has a very low overhead, but still an overhead. My first idea was storing pointers, but that is not serialization friendly, although not that hard to implement.