You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, there's a pre-compilation pass of chars into METAs.
Then, there are two passes over the METAs, once to find the length of the buffer, then secondly to write into the allocated buffer.
Can we just combine those? We'd need to have a buffer allocation strategy:
Some heuristics for initial size (which could be guessed reasonably accurately from the META pass)
Then if we need more buffer space, we can realloc to 1.5× (for example). A bit like appending to a dynamic container: resize with geometric growth
Finally, if there's any wastage at the end (eg. we ended up allocating 350 bytes but only used 300) then we could realloc down to release the unused space. Or, just accept it as OK, if the overestimate was small.
The upside would be: simpler code! And faster for most users (since only one pass needed). This is assuming that the change from 1×malloc + two pass compilation → 1×malloc + 1×realloc + single pass is actually an improvement.
The downside would be marginally higher memory usage for users with many many regexes, but realloc'ing down to the correct size at the end should solve that.
Getting rid of all the lengthptr != NULL code would be really quite nice.
The text was updated successfully, but these errors were encountered:
Reallocing is something that I never know it is good or bad. Probably depends on the allocator. Another option is more caching. For example character ranges is cached during lengthptr==null phase. Caching is just "pushing" in practice, since we walk the data twice in the same order, so no "searching" is needed, the reading order follows the creation order.
Overall, experimenting with other methods, and proving they are better is a resource consuming process.
Unfortunately, I screwed up when I designed the PCRE2 API in that the custom allocator interface has only alloc and free entries. There is no support for re-alloc. In any case, I would hope that considerations of this sort might be postponed till we manage to get 10.45 (and possibly 46, 47, ... because no doubt there will be issues after all the big changes) out of the door.
At the moment, there's a pre-compilation pass of chars into METAs.
Then, there are two passes over the METAs, once to find the length of the buffer, then secondly to write into the allocated buffer.
Can we just combine those? We'd need to have a buffer allocation strategy:
The upside would be: simpler code! And faster for most users (since only one pass needed). This is assuming that the change from
1×malloc + two pass compilation
→1×malloc + 1×realloc + single pass
is actually an improvement.The downside would be marginally higher memory usage for users with many many regexes, but realloc'ing down to the correct size at the end should solve that.
Getting rid of all the
lengthptr != NULL
code would be really quite nice.The text was updated successfully, but these errors were encountered: