Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make backfill batch selection exclude rows inserted or updated after backfill start #652

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

andrew-farries
Copy link
Collaborator

@andrew-farries andrew-farries commented Feb 4, 2025

Backfill only rows present at backfill start. This is third approach to solving #583. The previous two are:

This is the most direct approach to solving the problem. At the same time as the up/down triggers are created to perform a backfill, a _pgroll_needs_backfill column is also created on the table to be backfilled. The column has a DEFAULT of true; the constant default ensures that this extra column can be added quickly without a lengthy ACCESS_EXCLUSIVE lock. The column is removed when the the operation is rolled back or completed.

The up/down triggers are modified to set _pgroll_needs_backfill to false whenever they update a row.

The backfill itself is updated to select only rows having _pgroll_needs_backfill set to true - this ensures that only rows created before the triggers were installed are updated by the backfill. The backfill process still needs to read every row in the table, including those inserted/updated after backfill start, but only those rows created before backfill start will be updated.

The main disadvantage of this approach is that backfill now requires an extra column to be created on the target table.

NOTE: We'd need to update some docs (especially the tutorial) to mention this new column if we go with this solution.

Base automatically changed from remove-trigger-else-expr to main February 4, 2025 11:35
For all rows backfilled by the up or down triggers, set the
`_pgroll_needs_backfill` column to `false` for the updated row.
Make the `TableMustBeCleanedUp` assertion check that the
'_pgroll_needs_backfill` column is not present in the table.
Include only those rows in each batch having `_pgroll_needs_backfill`
set to `true`.
Rows are returned in a different order.
Extend operations to remove the `_pgroll_needs_backfill` column on
completion and rollback.
@andrew-farries andrew-farries force-pushed the backfill-old-tuples-only3 branch from 511a84f to de9cc8b Compare February 4, 2025 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant