-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: slightly improve substitutions #562
base: master
Are you sure you want to change the base?
Conversation
Avoid at least one crash introduced with recent changes to substitute code as well as clarify what the expected offset value should be when overflowing the provided buffer. While at it, make sure that the returned string is always NUL terminated, and do some minor cleanup.
Could you update this patch? |
Could you also split out the unrelated changes into their own PRs? It should be quick to do, and would let us merge the cosmetic changes and behaviour-altering changes in their own commits. |
@@ -1113,7 +1112,7 @@ in the decoded tables. */ | |||
|
|||
if ((code->flags & PCRE2_DEREF_TABLES) != 0) | |||
{ | |||
ref_count = (PCRE2_SIZE *)(code->tables + TABLES_LENGTH); | |||
PCRE2_SIZE *ref_count = (PCRE2_SIZE *)(code->tables + TABLES_LENGTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm very happy with these changes.
I know Philip likes the old style of defining variables high up, at the top of a scope, and with a blank line after variable definitions.
But I don't see any benefit to having variables available for use, but not yet initialised. Much better to define & initialise at the same time (safer).
The compiler will hoist all the variables up to the top anyway (it will bump the stack pointer just once at the start of a block, rather than bump the stack pointer multiple times, when it sees a new variable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly it's because I'm a dinosaur from the age when one had to define variables like that, but partly also I find it makes it easier when looking back up some code to find where a variable is defined. However, I am not going to try to impose my own preferences on the future. I can certainly see the advantage of always initializing at definition time. So please don't worry about me too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny is that this change is still valid C89 code and the main motivation wasn't to go against Philip's advice of defining variables at the beginning of blocks, but just reducing the scope of this variable to where it was actually needed/used.
Since we have at least one CI job with -Wshadow
and I wanted to minimize churn didn't rename the variable to reflect its "temp" holder (might be even optimized out) status.
extra_needed++; | ||
lengthleft = 0; | ||
} | ||
if (!overflowed || lengthleft == 0) buffer[buff_offset] = 0; else extra_needed++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you need to inline the CHECKMEMCPY here for, for the trailing NUL?
What was wrong before? Do you want the returned string to be NUL-terminated, even if the function returns an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want the returned string to be NUL-terminated, even if the function returns an error?
Correct, I found the way this function behaves strange and the fact that it will return non NUL terminated strings on overflow, potentially risky.
123abc123\=substitute_overflow_length,replace=[9]XYZ | ||
123abc123\=substitute_overflow_length,replace=[6]XYZ | ||
123abc123\=substitute_overflow_length,replace=[1]XYZ | ||
123abc123\=substitute_overflow_length,replace=[0]XYZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious to run these new tests against the old code, just to see which (if any) of the test outputs have changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None does, but could add split the tests in a "setup" patch of its own so it will be obvious
Avoid at least one crash introduced with recent changes to substitute code as well as clarify what the expected offset value should be when overflowing the provided buffer.
While at it, make sure that the returned string is always NUL terminated, and do some minor cleanup.
NOTE: at least truncation is wrong so posting mainly as a FYI with the hopes someone else might give it some love