Make backfill batch selection exclude rows inserted or updated after backfill start #634
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement one possible solution to limit backfill to touch only rows that existed in the table prior to backfill start and ignore rows inserted or updated after backfill start.
I believe the solution presented here is flawed and should not be merged as is; the PR is open for discussion to see if the technique can be made to work or if we need to take a different approach.
In general, it is fine from a correctness POV if backfill updates a row that was already updated/inserted by a transaction that committed after backfill started; it's not OK from a performance POV however as a backfill running at the same time as a high rate of
INSERT
s into the table will cause the backfill to never terminate (the issue is described in #583).Proposed solution
Backfill works be toucing rows in batches (batch size is configurable). 'Touch' in this context means to set the row's PK to itself, causing the already-installed backfill trigger to fire for that row.
As of this PR, the per-batch query looks like:
The first CTE (
WITH batch AS...
) is the relevant part here. The purpose of this CTE is to select the next batch of rows to be updated (and lock those rows for update).The relevant part of the first CTE is this bit:
This is where we attempt to filter out tuples that were created/updated after the backfill process started. '1234' represents the
xid
of when the backfill process started. The first part:checks to see if the tuple is older than the
xid
when the backfill started. If so the tuple should be part of the batch.b_follows_a
implements anxid
-wraparound safe comparison ofxid
s. The transaction id space (0 - 2^32-1) is considered as a circle and anything in the forward half of the circle is considered ahead ofxid
, anything else is behind. See Postgres Internals book - Chapter 7, Freezing for a description:Using this calculation alone to determine relative ages between transaction ids will fail for very old rows (older than 2^31 transactions since backfill start), which will appear to be in the future.
The
frozen_xid
function is defined:The test checks if the transaction id that created the tuple comes before the oldest unfrozen tuple in the table (
pg_class.relfrozenxid
). If so, the tuple is frozen and should be included in the batch even if the visibility check would regard it as in the future.Problem
What happens if a tuple was frozen many billions of transactions ago (ie several
xid
wraparounds ago)? the 32 bitpg_class.relfrozenxid
won't be able to tell us accurately whether thexid
of this extremely old tuple should be considered frozen or not -relfrozenxid
doesn't containepoch
information about which wraparound cycle it refers to.The ultimate truth of a whether a tuple is frozen or not is contained in the tuple header - frozen tuples have their
HEAP_XMIN_FROZEN
bits set int_infomask
:But the only way to access the tuple header is via the
pageinspect
extension. Prior to Postges9.4
, frozen tuples had theirxmin
replaced with a special value to indicate that the row was frozen which made easy identification of frozen tuples from SQL possible, but this is no longer the case - the tupleinfomask
is used instead.Summary
Without a reliable way to check from SQL whether a tuple is frozen, I don't think this approach is robust. Reliably checking whether a tuple is frozen looks like it requires access to the tuple header, not possible from SQL, only by using extensions.
Using
pageinspect
to determine if a tuple is frozen may be possible, but would introduce a dependency on that extension; currentlypgroll
does not require any extensions.Without robust checks for frozen tuples, the backfill process could exclude tuples that should be backfilled potentially resulting in data loss.
References