Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

andrew-farries
Copy link
Collaborator

Implement one possible solution to limit backfill to touch only rows that existed in the table prior to backfill start and ignore rows inserted or updated after backfill start.

I believe the solution presented here is flawed and should not be merged as is; the PR is open for discussion to see if the technique can be made to work or if we need to take a different approach.

In general, it is fine from a correctness POV if backfill updates a row that was already updated/inserted by a transaction that committed after backfill started; it's not OK from a performance POV however as a backfill running at the same time as a high rate of INSERTs into the table will cause the backfill to never terminate (the issue is described in #583).

Proposed solution

Backfill works be toucing rows in batches (batch size is configurable). 'Touch' in this context means to set the row's PK to itself, causing the already-installed backfill trigger to fire for that row.

As of this PR, the per-batch query looks like:

WITH batch AS
(
  SELECT "id", "zip"
  FROM "table_name"
  WHERE (
    "pgroll".b_follows_a(xmin::text::bigint, 1234) OR
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)
  )
  AND ("id", "zip") > ('1', '1234')
  ORDER BY "id", "zip"
  LIMIT 10
  FOR NO KEY UPDATE
),
update AS
(
  UPDATE "table_name"
  SET "id" = "table_name"."id", "zip" = "table_name"."zip"
  FROM batch
  WHERE "table_name"."id" = batch."id" AND "table_name"."zip" = batch."zip"
  RETURNING "table_name"."id", "table_name"."zip"
)
SELECT LAST_VALUE("id") OVER(), LAST_VALUE("zip") OVER()
FROM update

The first CTE (WITH batch AS...) is the relevant part here. The purpose of this CTE is to select the next batch of rows to be updated (and lock those rows for update).

The relevant part of the first CTE is this bit:

WHERE (
    "pgroll".b_follows_a(xmin::text::bigint, 1234) OR
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)
)

This is where we attempt to filter out tuples that were created/updated after the backfill process started. '1234' represents the xid of when the backfill process started. The first part:

    "pgroll".b_follows_a(xmin::text::bigint, 1234)

checks to see if the tuple is older than the xid when the backfill started. If so the tuple should be part of the batch. b_follows_a implements an xid-wraparound safe comparison of xids. The transaction id space (0 - 2^32-1) is considered as a circle and anything in the forward half of the circle is considered ahead of xid, anything else is behind. See Postgres Internals book - Chapter 7, Freezing for a description:

image

Using this calculation alone to determine relative ages between transaction ids will fail for very old rows (older than 2^31 transactions since backfill start), which will appear to be in the future.

image
Here, the first snowflake transaction at the top of the circle will appear to be in the future relative to T1, but really it's in the past. Postgres solves this problem by 'freezing` old tuples once they are guaranteed visible to all transactions (exactly when Postgres freezes old tuples is configurable, but not relevant here). So we also need a second check to always include a row in the batch if it's frozen:
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)

The frozen_xid function is defined:

-- Find the xid for a table before which all transaction ids in the table are
-- frozen
CREATE OR REPLACE FUNCTION pgroll.frozen_xid(schema_name name, table_name name)
RETURNS xid
LANGUAGE SQL
AS $$
  SELECT relfrozenxid
	FROM pg_class
	WHERE relnamespace::regnamespace::name = schema_name
	AND relname = table_name
	AND relkind = 'r'
$$;

The test checks if the transaction id that created the tuple comes before the oldest unfrozen tuple in the table (pg_class.relfrozenxid). If so, the tuple is frozen and should be included in the batch even if the visibility check would regard it as in the future.

Problem

What happens if a tuple was frozen many billions of transactions ago (ie several xid wraparounds ago)? the 32 bit pg_class.relfrozenxid won't be able to tell us accurately whether the xid of this extremely old tuple should be considered frozen or not - relfrozenxid doesn't contain epoch information about which wraparound cycle it refers to.

The ultimate truth of a whether a tuple is frozen or not is contained in the tuple header - frozen tuples have their HEAP_XMIN_FROZEN bits set in t_infomask:

static inline bool
HeapTupleHeaderXminFrozen(const HeapTupleHeaderData *tup)
{
	return (tup->t_infomask & HEAP_XMIN_FROZEN) == HEAP_XMIN_FROZEN;
}

But the only way to access the tuple header is via the pageinspect extension. Prior to Postges 9.4, frozen tuples had their xmin replaced with a special value to indicate that the row was frozen which made easy identification of frozen tuples from SQL possible, but this is no longer the case - the tuple infomask is used instead.

Summary

Without a reliable way to check from SQL whether a tuple is frozen, I don't think this approach is robust. Reliably checking whether a tuple is frozen looks like it requires access to the tuple header, not possible from SQL, only by using extensions.

Using pageinspect to determine if a tuple is frozen may be possible, but would introduce a dependency on that extension; currently pgroll does not require any extensions.

Without robust checks for frozen tuples, the backfill process could exclude tuples that should be backfilled potentially resulting in data loss.

References

@andrew-farries andrew-farries added the help wanted Extra attention is needed label Jan 29, 2025
The `WithStateSchema` option allows to specify in which schema the
`pgroll` internal state is stored. This is useful when the `pgroll`
internal state is stored in a schema other than the default `pgroll`.
Ensure that only rows having an `xmin` that logically follows the `xid`
of the transaction that started the backfill are included in the batch.
Add a `WithSchema` option to `backfill.Backfill` to allow specifying the
schema in which the backfill will operate.
Ensure rows logically preceding the frozen transaction id for the table
are included in the batch of rows to be backfilled.
@andrew-farries andrew-farries force-pushed the backfill-old-tuples-only branch from dbe8bdf to 4bed314 Compare January 29, 2025 09:47
@andrew-farries andrew-farries requested review from urso and gulcin January 29, 2025 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant