Skip to content

Commit

Permalink
Recover from crash using undo log
Browse files Browse the repository at this point in the history
This commit remembers the file write position of a last successful transaction in
a undo file. If there's a crash and we need to restart, we use this undo file to get rid of any partial work from the last transaction that wasn't finished.

TODO: Can we use the undo log mechanism to even for graceful exit?
This would simplify the code and make it uniform for both graceful
and ungraceful exits/crashes. The idea is to undo the changes made
by the last incomplete transaction by reading the undo log file and
write other messages like ENDPOS.

Signed-off-by: Arunprasad Rajkumar <[email protected]>
  • Loading branch information
arajkumar committed Nov 22, 2023
1 parent c341810 commit e877bb6
Show file tree
Hide file tree
Showing 16 changed files with 512 additions and 22 deletions.
1 change: 1 addition & 0 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
- follow-9.6
- follow-data-only
- endpos-in-multi-wal-txn
- recover-from-crash
steps:
- name: Checkout repository
uses: actions/checkout@v3
Expand Down
16 changes: 16 additions & 0 deletions src/bin/pgcopydb/cli_stream.c
Original file line number Diff line number Diff line change
Expand Up @@ -1168,6 +1168,22 @@ stream_start_in_mode(LogicalStreamMode mode)
exit(EXIT_CODE_INTERNAL_ERROR);
}

/*
* Incase of a crash, execute recovery actions before starting the
* main loop.
*/
if (!recoverFromUndoLog(&specs.paths))
{
/* errors have already been logged */
exit(EXIT_CODE_INTERNAL_ERROR);
}

if (!removeUndoLog(&specs.paths))
{
/* errors have already been logged */
exit(EXIT_CODE_INTERNAL_ERROR);
}

switch (specs.mode)
{
case STREAM_MODE_RECEIVE:
Expand Down
4 changes: 4 additions & 0 deletions src/bin/pgcopydb/copydb.c
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,10 @@ copydb_prepare_filepaths(CopyFilePaths *cfPaths,
"%s/lsn.json",
cfPaths->cdc.dir);

sformat(cfPaths->cdc.undofile, MAXPGPATH,
"%s/undo",
cfPaths->cdc.dir);

/*
* Now prepare the "compare" files we need to compare schema and data
* between the source and target instance.
Expand Down
1 change: 1 addition & 0 deletions src/bin/pgcopydb/copydb.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
{ "extra_float_digits", "3" }, \
{ "statement_timeout", "0" }, \
{ "default_transaction_read_only", "off" }

/*
* These parameters are added to the connection strings, unless the user has
* added them, allowing user-defined values to be taken into account.
Expand Down
1 change: 1 addition & 0 deletions src/bin/pgcopydb/copydb_paths.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ typedef struct CDCPaths
char tlifile[MAXPGPATH]; /* /tmp/pgcopydb/cdc/tli */
char tlihistfile[MAXPGPATH]; /* /tmp/pgcopydb/cdc/tli.history */
char lsntrackingfile[MAXPGPATH]; /* /tmp/pgcopydb/cdc/lsn.json */
char undofile[MAXPGPATH]; /* /tmp/pgcopydb/cdc/undo */
} CDCPaths;


Expand Down
16 changes: 16 additions & 0 deletions src/bin/pgcopydb/follow.c
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,22 @@ follow_get_sentinel(StreamSpecs *specs, CopyDBSentinel *sentinel, bool verbose)
bool
follow_main_loop(CopyDataSpec *copySpecs, StreamSpecs *streamSpecs)
{
/*
* Incase of a crash, execute recovery actions before starting the
* main loop.
*/
if (!recoverFromUndoLog(&streamSpecs->paths))
{
log_error("Failed to recover from undo log, see above for details");
return false;
}

if (!removeUndoLog(&streamSpecs->paths))
{
log_error("Failed to remove undo log, see above for details");
return false;
}

/*
* Remove the possibly still existing stream context files from
* previous round of operations (--resume, etc). We want to make
Expand Down
Loading

0 comments on commit e877bb6

Please sign in to comment.