Make backfill batch selection exclude rows inserted or updated after backfill start #634

andrew-farries · 2025-01-29T09:41:40Z

Implement one possible solution to limit backfill to touch only rows that existed in the table prior to backfill start and ignore rows inserted or updated after backfill start.

I believe the solution presented here is flawed and should not be merged as is; the PR is open for discussion to see if the technique can be made to work or if we need to take a different approach.

In general, it is fine from a correctness POV if backfill updates a row that was already updated/inserted by a transaction that committed after backfill started; it's not OK from a performance POV however as a backfill running at the same time as a high rate of INSERTs into the table will cause the backfill to never terminate (the issue is described in #583).

Proposed solution

Backfill works be toucing rows in batches (batch size is configurable). 'Touch' in this context means to set the row's PK to itself, causing the already-installed backfill trigger to fire for that row.

As of this PR, the per-batch query looks like:

WITH batch AS
(
  SELECT "id", "zip"
  FROM "table_name"
  WHERE (
    "pgroll".b_follows_a(xmin::text::bigint, 1234) OR
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)
  )
  AND ("id", "zip") > ('1', '1234')
  ORDER BY "id", "zip"
  LIMIT 10
  FOR NO KEY UPDATE
),
update AS
(
  UPDATE "table_name"
  SET "id" = "table_name"."id", "zip" = "table_name"."zip"
  FROM batch
  WHERE "table_name"."id" = batch."id" AND "table_name"."zip" = batch."zip"
  RETURNING "table_name"."id", "table_name"."zip"
)
SELECT LAST_VALUE("id") OVER(), LAST_VALUE("zip") OVER()
FROM update

The first CTE (WITH batch AS...) is the relevant part here. The purpose of this CTE is to select the next batch of rows to be updated (and lock those rows for update).

The relevant part of the first CTE is this bit:

WHERE (
    "pgroll".b_follows_a(xmin::text::bigint, 1234) OR
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)
)

This is where we attempt to filter out tuples that were created/updated after the backfill process started. '1234' represents the xid of when the backfill process started. The first part:

    "pgroll".b_follows_a(xmin::text::bigint, 1234)

checks to see if the tuple is older than the xid when the backfill started. If so the tuple should be part of the batch. b_follows_a implements an xid-wraparound safe comparison of xids. The transaction id space (0 - 2^32-1) is considered as a circle and anything in the forward half of the circle is considered ahead of xid, anything else is behind. See Postgres Internals book - Chapter 7, Freezing for a description:

Using this calculation alone to determine relative ages between transaction ids will fail for very old rows (older than 2^31 transactions since backfill start), which will appear to be in the future.

Here, the first snowflake transaction at the top of the circle will appear to be in the future relative to T1, but really it's in the past. Postgres solves this problem by 'freezing` old tuples once they are guaranteed visible to all transactions (exactly when Postgres freezes old tuples is configurable, but not relevant here). So we also need a second check to always include a row in the batch if it's frozen:

    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)

The frozen_xid function is defined:

-- Find the xid for a table before which all transaction ids in the table are
-- frozen
CREATE OR REPLACE FUNCTION pgroll.frozen_xid(schema_name name, table_name name)
RETURNS xid
LANGUAGE SQL
AS $$
  SELECT relfrozenxid
	FROM pg_class
	WHERE relnamespace::regnamespace::name = schema_name
	AND relname = table_name
	AND relkind = 'r'
$$;

The test checks if the transaction id that created the tuple comes before the oldest unfrozen tuple in the table (pg_class.relfrozenxid). If so, the tuple is frozen and should be included in the batch even if the visibility check would regard it as in the future.

Problem

What happens if a tuple was frozen many billions of transactions ago (ie several xid wraparounds ago)? the 32 bit pg_class.relfrozenxid won't be able to tell us accurately whether the xid of this extremely old tuple should be considered frozen or not - relfrozenxid doesn't contain epoch information about which wraparound cycle it refers to.

The ultimate truth of a whether a tuple is frozen or not is contained in the tuple header - frozen tuples have their HEAP_XMIN_FROZEN bits set in t_infomask:

static inline bool
HeapTupleHeaderXminFrozen(const HeapTupleHeaderData *tup)
{
	return (tup->t_infomask & HEAP_XMIN_FROZEN) == HEAP_XMIN_FROZEN;
}

But the only way to access the tuple header is via the pageinspect extension. Prior to Postges 9.4, frozen tuples had their xmin replaced with a special value to indicate that the row was frozen which made easy identification of frozen tuples from SQL possible, but this is no longer the case - the tuple infomask is used instead.

Summary

Without a reliable way to check from SQL whether a tuple is frozen, I don't think this approach is robust. Reliably checking whether a tuple is frozen looks like it requires access to the tuple header, not possible from SQL, only by using extensions.

Using pageinspect to determine if a tuple is frozen may be possible, but would introduce a dependency on that extension; currently pgroll does not require any extensions.

Without robust checks for frozen tuples, the backfill process could exclude tuples that should be backfilled potentially resulting in data loss.

References

The `WithStateSchema` option allows to specify in which schema the `pgroll` internal state is stored. This is useful when the `pgroll` internal state is stored in a schema other than the default `pgroll`.

Ensure that only rows having an `xmin` that logically follows the `xid` of the transaction that started the backfill are included in the batch.

Add a `WithSchema` option to `backfill.Backfill` to allow specifying the schema in which the backfill will operate.

Ensure rows logically preceding the frozen transaction id for the table are included in the batch of rows to be backfilled.

andrew-farries added the help wanted Extra attention is needed label Jan 29, 2025

andrew-farries added 7 commits January 29, 2025 09:47

Add wraparound-aware cmp function to init.sql

7871cc8

Get transaction id at start of backfill process

57cb689

Add WithStateSchema option for backfill

0b5d1e7

The `WithStateSchema` option allows to specify in which schema the `pgroll` internal state is stored. This is useful when the `pgroll` internal state is stored in a schema other than the default `pgroll`.

Add test for xmin in backfill template

629b8e4

Ensure that only rows having an `xmin` that logically follows the `xid` of the transaction that started the backfill are included in the batch.

Add WithSchema option to backfill.Backfill

2446d91

Add a `WithSchema` option to `backfill.Backfill` to allow specifying the schema in which the backfill will operate.

Add frozen_xid function to init.sql

1ed7f7c

Test for frozen transaction id in each batch

4bed314

Ensure rows logically preceding the frozen transaction id for the table are included in the batch of rows to be backfilled.

andrew-farries force-pushed the backfill-old-tuples-only branch from dbe8bdf to 4bed314 Compare January 29, 2025 09:47

andrew-farries requested review from urso and gulcin January 29, 2025 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Make backfill batch selection exclude rows inserted or updated after backfill start #634

andrew-farries commented Jan 29, 2025

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Are you sure you want to change the base?

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Conversation

andrew-farries commented Jan 29, 2025

Proposed solution

Problem

Summary

References