Single compilation pass over METAs (wild idea) #587

NWilson · 2024-12-02T09:14:33Z

At the moment, there's a pre-compilation pass of chars into METAs.

Then, there are two passes over the METAs, once to find the length of the buffer, then secondly to write into the allocated buffer.

Can we just combine those? We'd need to have a buffer allocation strategy:

Some heuristics for initial size (which could be guessed reasonably accurately from the META pass)
Then if we need more buffer space, we can realloc to 1.5× (for example). A bit like appending to a dynamic container: resize with geometric growth
Finally, if there's any wastage at the end (eg. we ended up allocating 350 bytes but only used 300) then we could realloc down to release the unused space. Or, just accept it as OK, if the overestimate was small.

The upside would be: simpler code! And faster for most users (since only one pass needed). This is assuming that the change from 1×malloc + two pass compilation → 1×malloc + 1×realloc + single pass is actually an improvement.

The downside would be marginally higher memory usage for users with many many regexes, but realloc'ing down to the correct size at the end should solve that.

Getting rid of all the lengthptr != NULL code would be really quite nice.

The text was updated successfully, but these errors were encountered:

zherczeg · 2024-12-02T10:08:26Z

Reallocing is something that I never know it is good or bad. Probably depends on the allocator. Another option is more caching. For example character ranges is cached during lengthptr==null phase. Caching is just "pushing" in practice, since we walk the data twice in the same order, so no "searching" is needed, the reading order follows the creation order.

Overall, experimenting with other methods, and proving they are better is a resource consuming process.

PhilipHazel · 2024-12-02T15:35:46Z

Unfortunately, I screwed up when I designed the PCRE2 API in that the custom allocator interface has only alloc and free entries. There is no support for re-alloc. In any case, I would hope that considerations of this sort might be postponed till we manage to get 10.45 (and possibly 46, 47, ... because no doubt there will be issues after all the big changes) out of the door.

NWilson · 2024-12-02T15:46:58Z

Yes of course! No rush!

zherczeg · 2024-12-03T14:00:20Z

Usually these kinds of tasks that we do in the University for our partners.

NWilson added the untidiness Not exactly a bug, but could do better label Dec 9, 2024

NWilson added this to the Future milestone Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single compilation pass over METAs (wild idea) #587

Single compilation pass over METAs (wild idea) #587

NWilson commented Dec 2, 2024

zherczeg commented Dec 2, 2024

PhilipHazel commented Dec 2, 2024

NWilson commented Dec 2, 2024

zherczeg commented Dec 3, 2024

Single compilation pass over METAs (wild idea) #587

Single compilation pass over METAs (wild idea) #587

Comments

NWilson commented Dec 2, 2024

zherczeg commented Dec 2, 2024

PhilipHazel commented Dec 2, 2024

NWilson commented Dec 2, 2024

zherczeg commented Dec 3, 2024