You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zaloha is a small and simple directory synchronizer:
* Zaloha is a BASH script that uses only FIND, SORT and AWK. All you need
is the Zaloha2.sh file. This documentation is contained in Zaloha2.sh too.
* Cyber-secure: No new binary code, no new open ports, no interaction with
the Internet, easily reviewable.
* Three operation modes are available: Local Mode, Remote Source Mode and
Remote Backup Mode
* Local Mode: Both <sourceDir> and <backupDir> are available locally
(local HDD/SSD, flash drive, mounted Samba or NFS volume).
* Remote Source Mode: <sourceDir> is on a remote source host that can be
reached via SSH/SCP, <backupDir> is available locally.
* Remote Backup Mode: <sourceDir> is available locally, <backupDir> is on a
remote backup host that can be reached via SSH/SCP.
* Zaloha does not lock files while copying them. No writing on either directory
may occur while Zaloha runs.
* Zaloha always copies whole files via the operating system's CP command
or the SCP command (= no delta-transfer like in RSYNC).
* Zaloha is not limited by memory (metadata is processed as CSV files,
no limits for huge directory trees).
* Zaloha has optional reverse-synchronization features (details below).
* Zaloha can optionally compare the contents of files (details below).
* Zaloha prepares scripts for case of eventual restore (can be optionally
switched off to shorten the analysis phase, details below).
To detect which files need synchronization, Zaloha compares file sizes and
modification times. It is clear that such detection is not 100% waterproof.
A waterproof solution requires comparing file contents, e.g. via "byte by byte"
comparison or via SHA-256 hashes. However, such comparing increases the
processing time by orders of magnitude. Therefore, it is not enabled by default.
Section Advanced Use of Zaloha describes two alternatives how to enable it.
Zaloha asks to confirm actions before they are executed, i.e. prepared actions
can be skipped, exceptional cases manually resolved, and Zaloha re-run.
For automatic operations, use the --noExec option to tell Zaloha to not ask
and to not execute the actions (but still prepare the scripts).
<sourceDir> and <backupDir> can be on different filesystem types if the
filesystem limitations are not hit. Such limitations are (e.g. in case of
ext4 -> FAT): not allowed characters in filenames, filename uppercase
conversions, file size limits, etc.
No writing on either directory may occur while Zaloha runs (no file locking is
implemented). In high-availability IT operations, a higher class of backup
solution should be deployed, based on taking filesystem snapshots at times when
writing processes are stopped for a short instant (i.e. functionality that must
be supported by the underlying OS). If either directory contains data files
of running databases, then they must be excluded from backups on file level.
Databases have their own logic of backups, replications and failovers, usually
based on transactional logs, and it is plainly wrong to intervene with generic
tools that operate on files and directories. Dedicated tools provided by the
database vendor shall be used.
Handling of "weird" characters in filenames was a special focus during
development of Zaloha (details below).
On Linux/Unics, Zaloha runs natively. On Windows, Cygwin is needed.
Repository: https://github.com/Fitus/Zaloha2.sh
An add-on script to create hardlink-based snapshots of the backup directory
exists, that allows to create "Time Machine"-like backup solutions:
Repository of add-on script: https://github.com/Fitus/Zaloha2_Snapshot.sh
MORE DETAILED DESCRIPTION
The operation of Zaloha can be partitioned into five steps, in that following
actions are performed:
Exec1: unavoidable removals from <backupDir> (objects of conflicting types
which occupy needed namespace)
-----------------------------------
RMDIR regular remove directory from <backupDir>
REMOVE regular remove file from <backupDir>
REMOVE.! remove file from <backupDir> which is newer than the
last run of Zaloha
REMOVE.l remove symbolic link from <backupDir>
REMOVE.x remove other object from <backupDir>, x = object type (p/s/c/b/D)
Exec2: copy files/directories to <backupDir> which exist only in <sourceDir>,
or files which are newer in <sourceDir>
-----------------------------------
MKDIR regular create new directory in <backupDir>
NEW regular create new file in <backupDir>
UPDATE regular update file in <backupDir>
UPDATE.! update file in <backupDir> which is newer than the last run of Zaloha
UPDATE.? update file in <backupDir> by a file in <sourceDir> which is not newer
(or not newer by 3600 secs if option --ok3600s is given plus
an eventual 2 secs FAT tolerance)
unl.UP unlink file in <backupDir> + UPDATE (can be switched off via the
--noUnlink option, see below)
unl.UP.! unlink file in <backupDir> + UPDATE.! (can be switched off via the
--noUnlink option, see below)
unl.UP.? unlink file in <backupDir> + UPDATE.? (can be switched off via the
--noUnlink option, see below)
SLINK.n create new symbolic link in <backupDir> (if synchronization of
symbolic links is activated via the --syncSLinks option)
SLINK.u update (= unlink+create) a symbolic link in <backupDir> (if
synchronization of symbolic links is activated via the
--syncSLinks option)
ATTR:ugmT update only attributes in <backupDir> (u=user ownership,
g=group ownership, m=mode, T=modification time)
(optional features, see below)
Exec3: reverse-synchronization from <backupDir> to <sourceDir> (optional
feature, can be activated via the --revNew (or --revNewAll)
and --revUp options)
-----------------------------------
REV.MKDI reverse-create parent directory in <sourceDir> due to REV.NEWREV.NEW reverse-create file in <sourceDir> (if a standalone file in
<backupDir> is newer than the last run of Zaloha (in case
of the --revNewAll option irrespective of whether it is newer))
REV.UP reverse-update file in <sourceDir> (if the file in <backupDir>
is newer than the file in <sourceDir>)
REV.UP.! reverse-update file in <sourceDir> which is newer
than the last run of Zaloha (or newer than the last run of Zaloha
minus 3600 secs if option --ok3600s is given)
Exec4: remaining removals of obsolete files/directories from <backupDir>
(can be optionally switched off via the --noRemove option)
-----------------------------------
RMDIR regular remove directory from <backupDir>
REMOVE regular remove file from <backupDir>
REMOVE.! remove file from <backupDir> which is newer than the
last run of Zaloha
REMOVE.l remove symbolic link from <backupDir>
REMOVE.x remove other object from <backupDir>, x = object type (p/s/c/b/D)
Exec5: updates resulting from optional comparing contents of files
(optional feature, can be activated via the --byteByByte or
--sha256 options)
-----------------------------------
UPDATE.b update file in <backupDir> because its contents is not identical
unl.UP.b unlink file in <backupDir> + UPDATE.b (can be switched off via the
--noUnlink option, see below)
(internal use, for completeness only)
-----------------------------------
OK object without needed action in <sourceDir> (either files or
directories already synchronized with <backupDir>, or other objects
not to be synchronized to <backupDir>). These records are necessary
for preparation of shellscripts for the case of restore.
OK.b file proven identical byte by byte (in CSV metadata file 555)
KEEP object to be kept only in <backupDir>
uRMDIR unavoidable RMDIR which goes into Exec1 (in CSV files 380 and 390)
uREMOVE unavoidable REMOVE which goes into Exec1 (in CSV files 380 and 390)
INDIVIDUAL STEPS IN FULL DETAIL
Exec1:
------
Unavoidable removals from <backupDir> (objects of conflicting types which occupy
needed namespace). This must be the first step, because objects of conflicting
types in <backupDir> would prevent synchronization (e.g. a file cannot overwrite
a directory).
Unavoidable removals are prepared regardless of the --noRemove option.
Exec2:
------
Files and directories which exist only in <sourceDir> are copied to <backupDir>
(action codes NEW and MKDIR).
Further, Zaloha "updates" files in <backupDir> (action code UPDATE) if files
exist under same paths in both <sourceDir> and <backupDir> and the comparisons
of file sizes and modification times result in needed synchronization of the
files. If the files in <backupDir> are multiply linked (hardlinked), Zaloha
removes (unlinks) them first (action code unl.UP), to prevent "updating"
multiply linked files, which could lead to follow-up effects. This unlinking
can be switched off via the --noUnlink option.
Optionally, Zaloha can also synchronize attributes (u=user ownerships,
g=group ownerships, m=modes (permission bits)). This functionality can be
activated by the options --pUser, --pGroup and --pMode. The selected
attributes are then preserved during each MKDIR, NEW, UPDATE and unl.UP
action. Additionally, if these attributes differ on files and directories
for which no action is prepared, special action codes ATTR:ugm are prepared to
synchronize (only) the differing attributes.
Synchronization of attributes is an optional feature, because:
(1) the filesystem of <backupDir> might not be capable of storing these
attributes, or (2) it may be wanted that all files and directories in
<backupDir> are owned by the user who runs Zaloha.
Regardless of whether attributes are synchronized or not, an eventual restore
of <sourceDir> from <backupDir> including these attributes is possible thanks
to the restore scripts which Zaloha prepares in its Metadata directory
(see below).
Zaloha contains an optional feature to detect multiply linked (hardlinked) files
in <sourceDir>. If this feature is switched on (via the --detectHLinksS
option), Zaloha internally flags the second, third, etc. links to same file as
"hardlinks", and synchronizes to <backupDir> only the first link (the "file").
The "hardlinks" are not synchronized to <backupDir>, but Zaloha prepares a
restore script for them (file 830). If this feature is switched off
(no --detectHLinksS option), then each link to a multiply linked file is
treated as a separate regular file.
The detection of hardlinks brings two risks: Zaloha might not detect that a file
is in fact a hardlink, or Zaloha might falsely detect a hardlink while the file
is in fact a unique file. The second risk is more severe, because the contents
of the unique file will not be synchronized to <backupDir> in such case.
For that reason, Zaloha contains additional checks against falsely detected
hardlinks (see code of AWKHLINKS). Generally, use this feature only after proper
testing on your filesystems. Be cautious as inode-related issues exist on some
filesystems and network-mounted filesystems.
Symbolic links in <sourceDir>: There are two dimensions: The first dimension is
whether to follow them or not (the --followSLinksS option). If follow, then
the referenced files and directories are synchronized to <backupDir> and only
the broken symbolic links stay as symbolic links. If not follow, then all
symbolic links stay as symbolic links. See section Following Symbolic Links for
details. Now comes the second dimension: What to do with the symbolic links that
stay as symbolic links: They are always kept in the metadata and Zaloha prepares
a restore script for them (file 820). Additionally, if the option --syncSLinks
is given, Zaloha will indeed synchronize them to <backupDir> (action codes
SLINK.n or SLINK.u).
Zaloha does not synchronize other types of objects in <sourceDir> (named pipes,
sockets, special devices, etc). These objects are considered to be part of the
operating system or parts of applications, and dedicated scripts for their
(re-)creation should exist.
It was a conscious decision for a default behaviour to synchronize to
<backupDir> only files and directories and keep other objects in metadata only.
This gives more freedom in the choice of filesystem type for <backupDir>,
because every filesystem type is able to store files and directories,
but not necessarily the other objects.
Exec3:
------
This step is optional and can be activated via the --revNew (or --revNewAll)
and --revUp options.
Why is this feature useful? Imagine you use a Windows notebook while working in
the field. At home, you have got a Linux server to that you regularly
synchronize your data. However, sometimes you work directly on the Linux server.
That work should be "reverse-synchronized" from the Linux server (<backupDir>)
back to the Windows notebook (<sourceDir>) (of course, assumed that there is no
conflict between the work on the notebook and the work on the server).
REV.NEW: If standalone files in <backupDir> are newer than the last run of
Zaloha, and the --revNew option is given, then Zaloha reverse-copies that
files to <sourceDir> (action code REV.NEW). This might require creation of the
eventually missing but needed structure of parent directories (REV.MKDI).
It the --revNewAll option is given, then REV.NEW occur irrespective of whether
the standalone files in <backupDir> are newer than the last run of Zaloha.
REV.UP: If files exist under same paths in both <sourceDir> and <backupDir>,
and the files in <backupDir> are newer, and the --revUp option is given,
then Zaloha uses that files to reverse-update the older files in <sourceDir>
(action code REV.UP).
Optionally, to preserve attributes during the REV.MKDI, REV.NEW and REV.UP
actions: use options --pRevUser, --pRevGroup and --pRevMode.
If reverse-synchronization is not active: If neither --revNew nor
--revNewAll option is given, then each standalone file in <backupDir> is
considered obsolete (and removed, unless the --noRemove option is given).
If no --revUp option is given, then files in <sourceDir> always update
files in <backupDir> if their sizes and/or modification times differ.
Please note that the reverse-synchronization is NOT a full bi-directional
synchronization where <sourceDir> and <backupDir> would be equivalent.
Especially, there is no REV.REMOVE action. It was a conscious decision to not
implement it, as any removals from <sourceDir> would introduce not acceptable
risks.
Reverse-synchronization to <sourceDir> increases the overall complexity of the
solution. Use it only in the interactive regime of Zaloha, where human oversight
and confirmation of the prepared actions are in place.
Do not use it in automatic operations.
Exec4:
------
Zaloha removes all remaining obsolete files and directories from <backupDir>.
This function can be switched off via the --noRemove option.
Why are removals from <backupDir> split into two steps (Exec1 and Exec4)?
The unavoidable removals must unconditionally occur first, also in Exec1 step.
But what about the remaining (avoidable) removals: Imagine a scenario when a
directory is renamed in <sourceDir>: If all removals were executed in Exec1,
then <backupDir> would transition through a state (namely between Exec1 and
Exec2) where the backup copy of the directory is already removed (under the old
name), but not yet created (under the new name). To minimize the chance for such
transient states to occur, the avoidable removals are postponed to Exec4.
Advise to this topic: In case of bigger reorganizations of <sourceDir>, also
e.g. in case when a directory with large contents is renamed, it is much better
to prepare a rename script (more generally speaking: a migration script) and
apply it to both <sourceDir> and <backupDir>, instead of letting Zaloha perform
massive copying followed by massive removing.
Exec5:
------
Zaloha updates files in <backupDir> for which the optional comparisons of their
contents revealed that they are in fact not identical (despite appearing
identical by looking at their file sizes and modification times).
The action codes are UPDATE.b and unl.UP.b (the latter is update with prior
unlinking of multiply linked target file, as described under Exec2).
Please note that these actions might indicate deeper problems like storage
corruption (or even a cyber security issue), and should be actually perceived
as surprises.
This step is optional and can be activated via the --byteByByte or --sha256
options.
Metadata directory of Zaloha
----------------------------
Zaloha creates a Metadata directory: <backupDir>/.Zaloha_metadata. Its location
can be changed via the --metaDir option.
The purposes of the individual files are described as comments in program code.
Briefly, they are:
* AWK program files (produced from "here documents" in Zaloha)
* Shellscripts to run FIND commands
* CSV metadata files
* Exec1/2/3/4/5 shellscripts
* Shellscripts for the case of restore
* Touchfile 999 marking execution of actions
Files persist in the Metadata directory until the next invocation of Zaloha.
To obtain information about what Zaloha did (counts of removed/copied files,
total counts, etc), do not parse the screen output: Query the CSV metadata files
instead. Query the CSV metadata files after AWKCLEANER. Do not query the raw
CSV outputs of the FIND commands (before AWKCLEANER) and the produced
shellscripts, because due to eventual newlines in filenames, they may contain
multiple lines per "record".
In some situations, the existence of the Zaloha metadata directory is unwanted
after Zaloha finishes. In such cases, put a command to remove it to the wrapper
script that invokes Zaloha. At the same time, use the option --noLastRun to
prevent Zaloha from running FIND on file 999 in the Zaloha metadata directory
to obtain the time of the last run of Zaloha.
Please note that by not keeping the Zaloha metadata directory, you sacrifice
some functionality (see --noLastRun option below), and you loose the CSV
metadata for an eventual analysis of problems and you loose the shellscripts
for the case of restore (especially the scripts to restore the symbolic links
and hardlinks (which are eventually kept in metadata only)).
Temporary Metadata directory of Zaloha
--------------------------------------
In the Remote Source Mode, Zaloha needs a temporary Metadata directory on the
remote source host for copying scripts to there, executing them and obtaining
the CSV file from the FIND scan of <sourceDir> from there.
In the Remote Backup Mode, Zaloha performs its main metadata processing in a
temporary Metadata directory on the local (= source) host and then copies only
select metadata files to the Metadata directory on the remote (= backup) host.
The default location of the temporary Metadata directory is
<sourceDir>/.Zaloha_metadata_temp and can be changed via the --metaDirTemp
option.
Shellscripts for case of restore
--------------------------------
Zaloha prepares shellscripts for the case of restore in its Metadata directory
(scripts 800 through 870). Each type of operation is contained in a separate
shellscript, to give maximum freedom (= for each script, decide whether to apply
or to not apply). Further, each shellscript has a header part where
key variables for whole script are defined (and can be adjusted as needed).
The production of the shellscripts for the case of restore may cause increased
processing time and/or storage space consumption. It can be switched off by the
--noRestore option.
In case of need, the shellscripts for the case of restore can also be prepared
manually by running the AWK program 700 on the CSV metadata file 505:
awk -f "<AWK program 700>" \
-v backupDir="<backupDir>" \
-v restoreDir="<restoreDir>" \
-v remoteBackup=<0 or 1> \
-v backupUserHost="<backupUserHost>" \
-v remoteRestore=<0 or 1> \
-v restoreUserHost="<restoreUserHost>" \
-v scpExecOpt="<scpExecOpt>" \
-v cpRestoreOpt="<cpRestoreOpt>" \
-v f800="<script 800 to be created>" \
-v f810="<script 810 to be created>" \
-v f820="<script 820 to be created>" \
-v f830="<script 830 to be created>" \
-v f840="<script 840 to be created>" \
-v f850="<script 850 to be created>" \
-v f860="<script 860 to be created>" \
-v f870="<script 870 to be created>" \
-v noR800Hdr=<0 or 1> \
-v noR810Hdr=<0 or 1> \
-v noR820Hdr=<0 or 1> \
-v noR830Hdr=<0 or 1> \
-v noR840Hdr=<0 or 1> \
-v noR850Hdr=<0 or 1> \
-v noR860Hdr=<0 or 1> \
-v noR870Hdr=<0 or 1> \
"<CSV metadata file 505>"
Note 1: All filenames/paths should begin with a "/" (if absolute) or with a "./"
(if relative), and <snapDir> and <restoreDir> must end with a terminating "/".
Note 2: If any of the filenames/paths passed into AWK as variables (<snapDir>,
<restoreDir> and the <scripts 8xx to be created>) contain backslashes as "weird
characters", replace them by ///b. The AWK program 700 will replace ///b back
to backslashes inside.
INVOCATION
Zaloha2.sh--sourceDir=<sourceDir> --backupDir=<backupDir> [ other options ... ]
--sourceDir=<sourceDir> is mandatory. <sourceDir> must exist, otherwise Zaloha
throws an error (except when the --noDirChecks option is given).
In Remote Source mode, this is the source directory on the remote source
host. If <sourceDir> is relative, then it is relative to the SSH login
directory of the user on the remote source host.
--backupDir=<backupDir> is mandatory. <backupDir> must exist, otherwise Zaloha
throws an error (except when the --noDirChecks option is given).
In Remote Backup mode, this is the backup directory on the remote backup
host. If <backupDir> is relative, then it is relative to the SSH login
directory of the user on the remote backup host.
--sourceUserHost=<sourceUserHost> indicates that <sourceDir> resides on a remote
source host to be reached via SSH/SCP. Format: user@host
--backupUserHost=<backupUserHost> indicates that <backupDir> resides on a remote
backup host to be reached via SSH/SCP. Format: user@host
--sshOptions=<sshOptions> are additional command-line options for the
SSH commands, separated by spaces. Typical usage is explained in section
Advanced Use of Zaloha - Remote Source and Remote Backup Modes.
--scpOptions=<scpOptions> are additional command-line options for the
SCP commands, separated by spaces. Typical usage is explained in section
Advanced Use of Zaloha - Remote Source and Remote Backup Modes.
--scpExecOpt=<scpExecOpt> can be used to override <scpOptions> specially for
the SCP commands used during the execution phase.
--findSourceOps=<findSourceOps> are additional operands for the FIND command
that scans <sourceDir>, to be used to exclude files or subdirectories in
<sourceDir> from synchronization to <backupDir>. This is a complex topic,
described in full detail in section FIND operands to control FIND commands
invoked by Zaloha.
The --findSourceOps option can be passed in several times. In such case
the final <findSourceOps> will be the concatenation of the several
individual <findSourceOps> passed in with the options.
--findGeneralOps=<findGeneralOps> are additional operands for the FIND commands
that scan both <sourceDir> and <backupDir>, to be used to exclude "Trash"
subdirectories, independently on where they exist, from Zaloha's scope.
This is a complex topic, described in full detail in section FIND operands
to control FIND commands invoked by Zaloha.
The --findGeneralOps option can be passed in several times. In such case
the final <findGeneralOps> will be the concatenation of the several
individual <findGeneralOps> passed in with the options.
--findParallel ... in the Remote Source and Remote Backup Modes, run the FIND
scans of <sourceDir> and <backupDir> in parallel. As the FIND scans run on
different hosts in the remote modes, this will save time.
--noExec ... needed if Zaloha is invoked automatically: do not ask,
do not execute the actions, but still prepare the scripts. The prepared
scripts then will not contain shell tracing and the "set -e" instruction.
This means that the scripts will ignore individual failed commands and try
to do as much work as possible, which is a behavior different from the
interactive regime, where scripts are traced and halt on the first error.
--noRemove ... do not remove files, directories and symbolic links that
are standalone in <backupDir>. This option is useful when <backupDir> should
hold "current" plus "historical" data whereas <sourceDir> holds only
"current" data.
Please keep in mind that if objects of conflicting types in <backupDir>
prevent synchronization (e.g. a file cannot overwrite a directory),
removals are unavoidable and will be prepared regardless of this option.
In such case Zaloha displays a warning message in the interactive regime.
In automatic operations, the calling process should query the CSV metadata
file 510 to detect this case.
--revNew ... enable REV.NEW (= if standalone file in <backupDir> is
newer than the last run of Zaloha, reverse-copy it
to <sourceDir>)
--revNewAll ... enable REV.NEW irrespective of whether the standalone file
in <backupDir> is newer than the last run of Zaloha
--revUp ... enable REV.UP (= if file in <backupDir> is newer than
file in <sourceDir>, reverse-update the file in <sourceDir>)
--detectHLinksS ... perform hardlink detection (inode-deduplication)
on <sourceDir>
--ok2s ... tolerate +/- 2 seconds differences due to FAT rounding of
modification times to nearest 2 seconds (special case
[SCC_FAT_01] explained in Special Cases section below).
This option is necessary only if Zaloha is unable to
determine the FAT file system from the FIND output
(column 6).
--ok3600s ... additional tolerable offset of modification time differences
of exactly +/- 3600 seconds (special case [SCC_FAT_01]
explained in Special Cases section below)
--byteByByte ... compare "byte by byte" files that appear identical (more
precisely, files for which either "no action" (OK) or just
"update of attributes" (ATTR) has been prepared).
(Explained in the Advanced Use of Zaloha section below).
This comparison might dramatically slow down Zaloha.
If additional updates of files result from this comparison,
they will be executed in step Exec5. This option is
available only in the Local Mode.
--sha256 ... compare contents of files via SHA-256 hashes. There is an
almost 100% security that files are identical if they have
equal sizes and SHA-256 hashes. Calculation of the hashes
might dramatically slow down Zaloha. If additional updates
of files result from this comparison, they will be executed
in step Exec5. Moreover, if files have equal sizes and
SHA-256 hashes but different modification times, copying of
such files will be prevented and only the modification times
will be aligned (ATTR:T). This option is available in all
three modes (Local, Remote Source and Remote Backup).
--noUnlink ... never unlink multiply linked files in <backupDir> before
writing to them
--extraTouch ... use cp + touch -m instead of cp --preserve=timestamps
(special case [SCC_OTHER_01] explained in Special Cases
section below). This has also a subtle impact on access
times (atime): cp --preserve=timestamps obtains mtime and
atime from the source file (before it reads it and changes
its atime) and applies the obtained mtime and atime to the
target file. On the contrary, cp keeps atime of the target
file intact and touch -m just sets the correct mtime on the
target file.
--cpOptions=<cpOptions> can be used to override the default command-line options
for the CP commands used in the Local Mode (which are
"--preserve=timestamps" (or none if option --extraTouch
is given)).
This option can be used if the CP command needs a different
option(s) to preserve timestamps during copying, or e.g. to
instruct CP to preserve extended attributes during copying
as well, or the like:
--cpOptions='--preserve=timestamps,xattr'
--cpRestoreOpt=<cpRestoreOpt> can be used to override <cpOptions> specially for
the CP commands used in the restore scripts.
--pUser ... preserve user ownerships, group ownerships and/or modes
--pGroup (permission bits) during MKDIR, NEW, UPDATE and unl.UP--pMode actions. Additionally, if these attributes differ on files
and directories for which no action is prepared, synchronize
the differing attributes (action codes ATTR:ugm).
The options --pUser and --pGroup also apply to symbolic
links if their synchronization is active (--syncSLinks).
--pRevUser ... preserve user ownerships, group ownerships and/or modes
--pRevGroup (permission bits) during REV.MKDI, REV.NEW and REV.UP--pRevMode actions
--followSLinksS ... follow symbolic links on <sourceDir>
--followSLinksB ... follow symbolic links on <backupDir>
Please see section Following Symbolic Links for details.
--syncSLinks ... synchronize symbolic links from <sourceDir> to <backupDir>
--noWarnSLinks ... suppress warnings related to symbolic links
--noRestore ... do not prepare scripts for the case of restore (= saves
processing time and disk space, see optimization note below). The scripts
for the case of restore can still be produced ex-post by manually running
the respective AWK program (700 file) on the source CSV file (505 file).
--optimCSV ... optimize space occupied by CSV metadata files by removing
intermediary CSV files after use (see optimization note below).
If intermediary CSV metadata files are removed, an ex-post analysis of
eventual problems may be impossible.
--metaDir=<metaDir> allows to place the Zaloha metadata directory to a different
location than the default (which is <backupDir>/.Zaloha_metadata).
The reasons for using this option might be:
a) non-writable <backupDir> (if Zaloha is used to perform comparison only
(i.e. with --noExec option))
b) a requirement to have Zaloha metadata on a separate storage
c) Zaloha is operated in the Local Mode, but <backupDir> is not available
locally (which means that the technical integration options described
under the section Advanced Use of Zaloha are utilized). In that case
it is necessary to place the Metadata directory to a location
accessible to Zaloha.
If <metaDir> is placed to a different location inside of <backupDir>, or
inside of <sourceDir> (in Local Mode), then it is necessary to explicitly
pass a FIND expression to exclude the Metadata directory from the respective
FIND scan via <findGeneralOps>.
If Zaloha is used for multiple synchronizations, then each such instance
of Zaloha must have its own separate Metadata directory.
In Remote Backup Mode, if <metaDir> is relative, then it is relative to the
SSH login directory of the user on the remote backup host.
--metaDirTemp=<metaDirTemp> may be used only in the Remote Source or Remote
Backup Modes, where Zaloha needs a temporary Metadata directory too. This
option allows to place it to a different location than the default
(which is <sourceDir>/.Zaloha_metadata_temp).
If <metaDirTemp> is placed to a different location inside of <sourceDir>,
then it is necessary to explicitly pass a FIND expression to exclude it
from the respective FIND scan via <findGeneralOps>.
If Zaloha is used for multiple synchronizations in the Remote Source or
Remote Backup Modes, then each such instance of Zaloha must have its own
separate temporary Metadata directory.
In Remote Source Mode, if <metaDirTemp> is relative, then it is relative to
the SSH login directory of the user on the remote source host.
--noDirChecks ... switch off the checks for existence of <sourceDir> and
<backupDir>. (Explained in the Advanced Use of Zaloha section below).
--noLastRun ... do not obtain time of the last run of Zaloha by running
FIND on file 999 in Zaloha metadata directory.
This makes Zaloha state-less, which might be a desired
property in certain situations, e.g. if you do not want to
keep the Zaloha metadata directory. However, this sacrifices
features based on the last run of Zaloha: REV.NEW and
distinction of actions on files newer than the last run
of Zaloha (e.g. distinction between UPDATE.! and UPDATE).
--noIdentCheck ... do not check if objects on identical paths in <sourceDir>
and <backupDir> are identical (= identical inodes). This
check brings to attention cases where objects in <sourceDir>
and corresponding objects in <backupDir> are in reality
the same objects (possibly via hardlinks), which violates
the logic of backup. Switching off this check might be
necessary in some special uses of Zaloha.
--noFindSource ... do not run FIND (script 210) to scan <sourceDir>
and use externally supplied CSV metadata file 310 instead
--noFindBackup ... do not run FIND (script 220) to scan <backupDir>
and use externally supplied CSV metadata file 320 instead
(Explained in the Advanced Use of Zaloha section below).
--no610Hdr ... do not write header to the shellscript 610 for Exec1
--no621Hdr ... do not write header to the shellscript 621 for Exec2
--no622Hdr ... do not write header to the shellscript 622 for Exec2
--no623Hdr ... do not write header to the shellscript 623 for Exec2
--no631Hdr ... do not write header to the shellscript 631 for Exec3
--no632Hdr ... do not write header to the shellscript 632 for Exec3
--no633Hdr ... do not write header to the shellscript 633 for Exec3
--no640Hdr ... do not write header to the shellscript 640 for Exec4
--no651Hdr ... do not write header to the shellscript 651 for Exec5
--no652Hdr ... do not write header to the shellscript 652 for Exec5
--no653Hdr ... do not write header to the shellscript 653 for Exec5
These options can be used only together with the --noExec option.
(Explained in the Advanced Use of Zaloha section below).
--noR800Hdr ... do not write header to the restore script 800
--noR810Hdr ... do not write header to the restore script 810
--noR820Hdr ... do not write header to the restore script 820
--noR830Hdr ... do not write header to the restore script 830
--noR840Hdr ... do not write header to the restore script 840
--noR850Hdr ... do not write header to the restore script 850
--noR860Hdr ... do not write header to the restore script 860
--noR870Hdr ... do not write header to the restore script 870
(Explained in the Advanced Use of Zaloha section below).
--noProgress ... suppress progress messages during the analysis phase (less
screen output). If --noProgress is used together with
--noExec, Zaloha does not produce any output on stdout
(traditional behavior of Unics tools).
--color ... use color highlighting (can be used on terminals which
support ANSI escape codes)
--mawk ... use mawk, the very fast AWK implementation based on a
bytecode interpreter. Without this option, awk is used,
which usually maps to GNU awk (but not always).
(Note: If you know that awk on your system maps to mawk,
use this option to make the mawk usage explicit, as this
option also turns off mawk's i/o buffering on places where
progress of commands is displayed, i.e. on places where
i/o buffering causes confusion and is unwanted).
--lTest ... (do not use in real operations) support for lint-testing
of AWK programs
--help ... show Zaloha documentation (using the LESS program) and exit
Optimization note: If Zaloha operates on directories with huge numbers of files,
especially small ones, then the size of metadata plus the size of scripts for
the case of restore may exceed the size of the files themselves. If this leads
to problems, use options --noRestore and --optimCSV.
Zaloha must be run by a user with sufficient privileges to read <sourceDir> and
to write and perform other required actions on <backupDir>. In case of the REV
actions, privileges to write and perform other required actions on <sourceDir>
are required as well. Zaloha does not contain any internal checks as to whether
privileges are sufficient. Failures of commands run by Zaloha must be monitored
instead.
Zaloha does not contain protection against concurrent invocations with
conflicting <backupDir> (and for REV also conflicting <sourceDir>): this is
responsibility of the invoker, especially due to the fact that Zaloha may
conflict with other processes as well.
In case of failure: resolve the problem and re-run Zaloha with same parameters.
In the second run, Zaloha should not repeat the actions completed by the first
run: it should continue from the action on which the first run failed. If the
first run completed successfully, no actions should be performed in the second
run (this is an important test case, see below).
Typically, Zaloha is invoked from a wrapper script that does the necessary
directory mounts, then runs Zaloha with the required parameters, then directory
unmounts.
FIND OPERANDS TO CONTROL FIND COMMANDS INVOKED BY ZALOHA
Zaloha obtains information about the files and directories via the FIND command.
Ad FIND command itself: It must support the -printf operand, as this allows to
obtain all needed information from a directory in one scan (= one process),
which is efficient. GNU find supports the -printf operand, but some older
FIND implementations don't, so they cannot be used with Zaloha.
The FIND scans of <sourceDir> and <backupDir> can be controlled by two options:
Option --findSourceOps are additional operands for the FIND command that scans
<sourceDir> only, and the option --findGeneralOps are additional operands
for both FIND commands (scans of both <sourceDir> and <backupDir>).
Both options --findSourceOps and --findGeneralOps can be passed in several
times. This allows to construct the final <findSourceOps> and <findGeneralOps>
in Zaloha part-wise, e.g. expression by expression.
Difference between <findSourceOps> and <findGeneralOps>
-------------------------------------------------------
<findSourceOps> applies only to <sourceDir>. If files in <sourceDir> are
excluded by <findSourceOps> and files exist in <backupDir> under same paths,
then Zaloha evaluates the files in <backupDir> as obsolete (= removes them,
unless the --noRemove option is given, or eventually even attempts to
reverse-synchronize them (which leads to corner case [SCC_FIND_01]
(see the Corner Cases section))).
On the contrary, the files excluded by <findGeneralOps> are not visible to
Zaloha at all, neither in <sourceDir> nor in <backupDir>, so Zaloha will not
act on them.
The main use of <findSourceOps> is to exclude files or subdirectories in
<sourceDir> from synchronization to <backupDir>.
The main use of <findGeneralOps> is to exclude "Trash" subdirectories,
independently on where they exist, from Zaloha's scope.
Rules and limitations
---------------------
Both <findSourceOps> and <findGeneralOps> must consist of one or more
FIND expressions in the form of an OR-connected chain:
expressionA -o expressionB -o ... expressionN -o
Adherence to this convention assures that Zaloha is able to correctly combine
<findSourceOps> with <findGeneralOps> and with own FIND expressions.
The OR-connected chain works so that if an earlier expression in the chain
evaluates TRUE, FIND does not evaluate following expressions, i.e. will not
evaluate the final -printf operand, so no output will be produced. In other
words, matching by any of the expressions leads to exclusion.
Further, the internal logic of Zaloha imposes the following limitations:
* Exclusion of files by the --findSourceOps option: No limitations exist
here, all expressions supported by FIND can be used (but make sure the
exclusion applies only to files). Example: exclude all files smaller than
1000 bytes:
--findSourceOps='( -type f -a -size -1000c ) -o'
* Exclusion of subdirectories by the --findSourceOps option: One limitation
must be obeyed: If a subdirectory is excluded, all its contents must be
excluded too. Why? If Zaloha sees the contents but not the subdirectory
itself, it will prepare commands to create the contents of the subdirectory,
but they will fail as the command to create the subdirectory itself (mkdir)
will not be prepared. Example: exclude all subdirectories owned by user fred
and all their contents:
--findSourceOps='( -type d -a -user fred ) -prune -o'
The -prune operand instructs FIND to not descend into directories matched
by the preceding expression.
* Exclusion of files by the --findGeneralOps option: As <findGeneralOps>
applies to both <sourceDir> and <backupDir>, and the objects in both
directories are "matched" by file's paths, only expressions with -path or
-name operands make sense. Why? If objects exist under same paths in both
directories, Zaloha should either see both of them or none of them.
Both -path and -name expressions assure this, but not necessarily the
expressions based on other operands like -size, -user and so on.
Example: exclude core dumps (files named core) wherever they exist:
--findGeneralOps='( -type f -a -name core ) -o'
Note 1: GNU find supports the -ipath and -iname operands for case-insensitive
matching of paths and names. They fulfill the above described "both or none"
criterion as well and hence are allowed too. The same holds for the -regex
and -iregex operands supported by GNU find, as they act on paths as well.
Note 2: As <findGeneralOps> act on both <sourceDir> and <backupDir> and the
paths differ in the start point directories, the placeholder ///d/ must be
used in the involved path patterns. This is described further below.
* Exclusion of subdirectories by the --findGeneralOps option: Both above
described limitations must be obeyed: Only expressions with -path or -name
operands are allowed, and if subdirectories are excluded, all their contents
must be excluded too. Notes 1 and 2 from previous bullet hold too.
Example: exclude subdirectories lost+found wherever they exist:
--findGeneralOps='( -type d -a -name lost+found ) -prune -o'
If you do not care if an object is a file or a directory, you can abbreviate:
--findGeneralOps='-name unwanted_name -prune -o'
--findGeneralOps='-path unwanted_path -prune -o'
*** CAUTION <findSourceOps> AND <findGeneralOps>: Zaloha does not validate if
the described rules and limitations are indeed obeyed. Wrong <findSourceOps>
and/or <findGeneralOps> can break Zaloha. On the other hand, an eventual
advanced use by knowledgeable users is not prevented. Some <findSourceOps>
and/or <findGeneralOps> errors might be detected in the directories hierarchy
check in AWKCHECKER.
Troubleshooting
---------------
If FIND operands do not work as expected, debug them using FIND alone.
Let's assume, that this does not work as expected:
--findSourceOps='( -type f -a -name *.tmp ) -o'
The FIND command to debug this is:
find <sourceDir> '(' -type f -a -name '*.tmp' ')' -o -printf 'path: %P\n'
Beware of interpretation by your shell
--------------------------------------
Your shell might interpret certain special characters contained on the command
line. Should these characters be passed to the called program (= Zaloha)
uninterpreted, they must be quoted or escaped.
The BASH shell does not interpret any characters in strings quoted by single
quotes. In strings quoted by double-quotes, the situation is more complex.
Please see the respective shell documentation for more details.
Parsing of FIND operands by Zaloha
----------------------------------
<findSourceOps> and <findGeneralOps> are passed into Zaloha as single strings.
Zaloha has to split these strings into individual operands (words) and pass them
to FIND, each operand as a separate command line argument. Zaloha has a special
parser (AWKPARSER) to do this.
The trivial case is when each (space-delimited) word is a separate FIND operand.
However, if a FIND operand contains spaces, it must be enclosed in double-quotes
(") to be treated as one operand. Moreover, if a FIND operand contains
double-quotes themselves, then it too must be enclosed in double-quotes (")
and the original double-quotes must be escaped by second double-quotes ("").
Examples (for BASH for both single-quoted and double-quoted strings):
* exclude all objects named Windows Security
* exclude all objects named My "Secret" Things
--findSourceOps='-name "Windows Security" -prune -o'
--findSourceOps='-name "My ""Secret"" Things" -prune -o'
--findSourceOps="-name \"Windows Security\" -prune -o"
--findSourceOps="-name \"My \"\"Secret\"\" Things\" -prune -o"
Interpretation of special characters by FIND itself
---------------------------------------------------
In the patterns of the -path and -name expressions, FIND itself interprets
following characters specially (see FIND documentation): *, ?, [, ], \.
If these characters are to be taken literally, they must be handed over to
FIND backslash-escaped.
Examples (for BASH for both single-quoted and double-quoted strings):
* exclude all objects whose names begin with abcd (i.e. FIND pattern abcd*)
* exclude all objects named exactly mnop* (literally including the asterisk)
--findSourceOps='-name abcd* -prune -o'
--findSourceOps='-name mnop\* -prune -o'
--findSourceOps="-name abcd* -prune -o"
--findSourceOps="-name mnop\\* -prune -o"
The placeholder ///d/ for the start point directories
-----------------------------------------------------
If expressions with the "-path" operand are used in <findSourceOps>, the
placeholder ///d/ should be used in place of <sourceDir>/ in their path
patterns.
If expressions with the "-path" operand are used in <findGeneralOps>, the
placeholder ///d/ must (not should) be used in place of <sourceDir>/ and
<backupDir>/ in their path patterns, unless, perhaps, the <sourceDir> and
<backupDir> parts of the paths are matched by a FIND wildcard.
Zaloha will replace ///d/ by the start point directory that is passed to FIND
in the given scan, with eventual FIND pattern special characters properly
escaped (which relieves you from doing the same by yourself).
Example: exclude <sourceDir>/.git
--findSourceOps="-path ///d/.git -prune -o"
Internally defined default for <findGeneralOps>
-----------------------------------------------
<findGeneralOps> has an internally defined default, used to exclude:
<sourceDir or backupDir>/$RECYCLE.BIN
... Windows Recycle Bin (assumed to exist directly under <sourceDir> or
<backupDir>)
<sourceDir or backupDir>/.Trash_<number>*
... Linux Trash (assumed to exist directly under <sourceDir> or
<backupDir>)
<sourceDir or backupDir>/lost+found
... Linux lost + found filesystem fragments (assumed to exist directly
under <sourceDir> or <backupDir>)
To replace this internal default with own <findGeneralOps>:
--findGeneralOps=<your replacement>
To switch off this internal default:
--findGeneralOps=
To extend (= combine, not replace) the internal default by own extension (note
the plus (+) sign):
--findGeneralOps=+<your extension>
If several --findGeneralOps options are passed in, the plus (+) sign mentioned
above should be passed in only with the first instance, not with the second,
third (and so on) instances.
Known traps and problems
------------------------
Beware of matching the start point directories <sourceDir> or <backupDir>
themselves by the expressions and patterns.
In some FIND versions, the name patterns starting with the asterisk (*)
wildcard do not match objects whose names start with a dot (.).
FOLLOWING SYMBOLIC LINKS
Technically, the --followSLinksS and/or --followSLinksB options in Zaloha
"just" pass the -L option to the FIND commands that scan <sourceDir> and/or
<backupDir>. However, it takes a fair amount of text to describe the impacts:
If FIND is invoked with the -L option, it returns information about the objects
the symbolic links point to rather than the symbolic links themselves (unless
the symbolic links are broken). Moreover, if the symbolic links point to
directories, the FIND scans continue in that directories as if they were
subdirectories (= symbolic links are followed).
In other words: If the directory structure of <sourceDir> is spanned by symbolic
links and symbolic links are followed due to the --followSLinksS option,
the FIND output will contain the whole structure spanned by the symbolic links,
BUT will not give any clue that FIND was going over the symbolic links.
The same sentence holds for <backupDir> and the --followSLinksB option.
Corollary 1: Independently on whether <sourceDir> is a plain directory structure
or spanned by symbolic links, Zaloha will create a plain directory structure
in <backupDir>. If the structure of <backupDir> should by spanned by symbolic
links too (not necessarily identically to <sourceDir>), then the symbolic links
and the referenced objects must be prepared in advance and the --followSLinksB
option must be given to follow symbolic links on <backupDir> (otherwise Zaloha
would remove the prepared symbolic links on <backupDir> and create real files
and directories in place of them).
Corollary 2: The restore scripts are not aware of the symbolic links that
spanned the original structure. They will restore a plain directory structure.
Again, if the structure of the restored directory should be spanned by symbolic
links, then the symbolic links and the referenced objects must be prepared
in advance. Please note that if the option --followSLinksS is given, the file
820_restore_sym_links.sh will contain only the broken symbolic links (as these
were the only symbolic links reported by FIND as symbolic links in that case).
The abovesaid is not much surprising given that symbolic links are frequently
used to place parts of directory structures to different storage media:
The different storage media must be mounted, directories on them must be
prepared and referenced by the symbolic links before any backup (or restore)
operations can begin.
Corner case synchronization of attributes (user ownerships, group ownerships,
modes (permission bits)) if symbolic links are followed: the attributes are
synchronized on the objects the symbolic links point to, not on the symbolic
links themselves.
Corner case removal actions: Eventual removal actions on places where the
structure is held together by the symbolic links are problematic. Zaloha will
prepare the REMOVE (rm -f) or RMDIR (rmdir) actions due to the objects having
been reported to it as files or directories. However, if the objects are in
reality symbolic links, "rm -f" removes the symbolic links themselves, not the
referenced objects, and "rmdir" fails with the "Not a directory" error.
Corner case loops: Loops can occur if symbolic links are in play. Zaloha can
only rely on the FIND command to handle them (= prevent running forever).
GNU find, for example, contains an internal mechanism to handle loops.
Corner case multiple visits: Although loops are prevented by GNU find, multiple
visits to objects are not. This happens when objects can be reached both via the
regular path hierarchy as well as via symbolic links that point to that objects
(or to their parent directories).
Technical note for the case when the start point directories themselves are
symbolic links: Zaloha passes all start point directories to FIND with trailing
slashes, which instructs FIND to follow them if they are symbolic links.
TESTING, DEPLOYMENT, INTEGRATION
First, test Zaloha on a small and noncritical set of your data. Although Zaloha
has been tested on several environments, it can happen that Zaloha malfunctions
on your environment due to different behavior of the operating system, BASH,
FIND, SORT, AWK and other utilities. Perform tests in the interactive regime
first. If Zaloha prepares wrong actions, abort it at the next prompt.
After first synchronization, an important test is to run second synchronization,
which should execute no actions, as the directories should be already
synchronized.
Test Zaloha under all scenarios which can occur on your environment. Test Zaloha
with filenames containing "weird" or national characters.
Verify that all your programs that write to <sourceDir> change modification
times of the files written, so that Zaloha does not miss changed files.
Simulate the loss of <sourceDir> and perform test of the recovery scenario using
the recovery scripts prepared by Zaloha.
Automatic operations
--------------------
Additional care must be taken when using Zaloha in automatic operations
(--noExec option):
Exit status and standard error of Zaloha and of the scripts prepared by Zaloha
must be monitored by a monitoring system used within your IT landscape.
Nonzero exit status and writes to standard error must be brought to attention
and investigated. If Zaloha itself fails, the process must be aborted.
The scripts prepared under the --noExec option do not halt on the first error,
also their zero exit status does not imply that there were no failed
individual commands.
Implement sanity checks to avoid data disasters like synchronizing <sourceDir>
to <backupDir> in the moment when <sourceDir> is unmounted, which would lead
to loss of backup data. Evaluate counts of actions prepared by Zaloha (count
records in CSV metadata files in Zaloha metadata directory). Abort the process
if the action counts exceed sanity thresholds defined by you, e.g. when Zaloha
prepares an unexpectedly high number of removals.
The process which invokes Zaloha in automatic regime should function as follows
(pseudocode):
run Zaloha2.sh --noExec
in case of failure: abort process
perform sanity checks on prepared actions
if ( sanity checks OK ) then
execute script 610
execute scripts 621, 622, 623
execute scripts 631, 632, 633
execute script 640
execute scripts 651, 652, 653
monitor execution (writing to stderr)
if ( execution successful ) then
execute script 690 to touch file 999
end if
end if
SPECIAL AND CORNER CASES
Cases related to the use of FIND
--------------------------------
Ideally, the FIND scans return data about all objects in the directories.
However, the options --findSourceOps and --findGeneralOps may cause parts
of the reality to be hidden (masked) from Zaloha, leading to these cases:
[SCC_FIND_01]
Corner case --revNew (or --revNewAll) with --findSourceOps: If files exist
under same paths in both <sourceDir> and <backupDir> and in <sourceDir> they
are masked by <findSourceOps>, then the eventual REV.NEW actions would be wrong.
This is an error which Zaloha is unable to detect. Hence, the shellscripts
for Exec3 contain REV_EXISTS checks that throw errors in such situations.
[SCC_FIND_02]
Corner case RMDIR with --findGeneralOps: If objects exist under a given
subdirectory of <backupDir> and all of them are masked by <findGeneralOps>,
and Zaloha prepares a RMDIR on that subdirectory, then that RMDIR fails with
the "Directory not empty" error.
Cases related to the FAT filesystem
-----------------------------------
[SCC_FAT_01]
To detect which files need synchronization, Zaloha compares file sizes and
modification times. If the file sizes differ, synchronization is needed.
The modification times are more complex:
* If one of the filesystems is FAT (i.e. FAT16, VFAT, FAT32), Zaloha tolerates
differences of +/- 2 seconds. This is necessary because FAT rounds the
modification times to nearest 2 seconds, while no such rounding occurs on
other filesystems. (Note: Why is a +/- 1 second tolerance not sufficient:
In some situations, a "ceiling" to nearest 2 seconds was observed instead of
"rounding", making a +/- 2 seconds tolerance necessary).
* If Zaloha is unable to determine the FAT file system from the FIND output
(column 6), it is possible to enforce the +/- 2 seconds tolerance via the
--ok2s option.
* In some situations, offsets of exactly +/- 1 hour (+/- 3600 seconds)
must be tolerated as well. Typically, this is necessary when one of the
directories is on a filesystem type that stores modification times
in local time instead of in universal time (e.g. FAT), and the OS is not
able, for some reason, to correctly adjust for daylight saving time while
converting the local time.
* The additional tolerable offsets of +/- 3600 seconds can be activated via the
--ok3600s option. They are assumed to exist between files in <sourceDir>
and files in <backupDir>, but not between files in <backupDir> and the
999 file in <metaDir> (from which the time of the last run of Zaloha is
obtained). This last note is relevant especially if <metaDir> is located
outside of <backupDir> (which is achievable via the --metaDir option).
[SCC_FAT_02]
Corner case REV.UP with --ok3600s: The --ok3600s option makes it harder
to determine which file is newer (decision UPDATE vs REV.UP). The implemented
solution for that case is that for REV.UP, the <backupDir> file must be newer
by more than 3600 seconds (plus an eventual 2 secs FAT tolerance).
[SCC_FAT_03]
Corner case FAT uppercase conversions: Explained by following example:
The source directory is on a Linux ext4 filesystem and contains the files
FILE.TXT, FILE.txt, file.TXT and file.txt in one of the subdirectories.
The backup directory is on a FAT-formatted USB flash drive. The synchronization
executes without visible problems, but in the backup directory, only FILE.TXT
exists after the synchronization.
What happened is that the OS/filesystem re-directed all four copy actions
into FILE.TXT. Also, after three overwrites, the backup of only one of the
four source files exists. Zaloha detects this situation on next synchronization
and prepares new copy commands, but they again hit the same problem.
The only effective solution seems to be the renaming of the source files to
avoid this type of name conflict.
Last note: A similar phenomenon has been observed in the Cygwin environment
running on Windows/ntfs too.
Cases related to hardlinked files
---------------------------------
[SCC_HLINK_01]
Corner case --detectHLinksS with new link(s) to same file added or removed:
The assignment of what link will be kept as "file" (f) and what links will be
tagged as "hardlinks" (h) in CSV metadata after AWKHLINKS may change, leading
to NEW and REMOVE actions.
[SCC_HLINK_02]
Corner case REV.UP with --detectHLinksS: Zaloha supports reverse-update of
only the first links in <sourceDir> (the ones that stay tagged as "files" (f)
in CSV metadata after AWKHLINKS). See also [SCC_CONFL_02].
[SCC_HLINK_03]
Corner case UPDATE or REV.UP with hardlinked files: Updating a multiply linked
(hardlinked) file means that the new contents will appear under all other links,
and that may lead to follow-up effects.
[SCC_HLINK_04]
Corner case update of attributes with hardlinked files: Updated attributes on a
multiply linked (hardlinked) file will (with exceptions on some filesystem
types) appear under all other links, and that may lead to follow-up effects.
[SCC_HLINK_05]
Corner case if same directory is passed in as <sourceDir> and <backupDir>:
Zaloha will issue a warning about identical objects. No actions will be prepared
due to both directories being identical, except when the directory contains
multiply-linked (hardlinked) files and the --detectHLinksS option is given.
In that case, Zaloha will prepare removals of the second, third, etc. links to
same files. This interesting side-effect (or new use case) is explained as
follows: Zaloha will perform hardlink detection on <sourceDir> and for the
detected hardlinks (h) it prepares removals of the corresponding files in
<backupDir>, which is the same directory. The hardlinks can be restored by
restore script 830_restore_hardlinks.sh.
Cases related to conflicting object type combinations
-----------------------------------------------------
[SCC_CONFL_01]
Corner case REV.NEW with namespace on <sourceDir> needed for REV.MKDI or REV.NEW
actions is occupied by objects of conflicting types: The files in <backupDir>
will not be reverse-copied to <sourceDir>, but removed. As these files must be
newer than the last run of Zaloha, the actions will be REMOVE.!.
[SCC_CONFL_02]
Corner case --detectHLinksS with objects in <backupDir> under same paths as
the seconds, third etc. hardlinks in <sourceDir> (the ones that will be tagged
as "hardlinks" (h) in CSV metadata after AWKHLINKS): The objects in <backupDir>
will be (unavoidably) removed to prevent misleading situations in that for a
hardlinked file in <sourceDir>, <backupDir> would contain a different object
(or eventually even a different file) under same path.
[SCC_CONFL_03]
Corner case objects in <backupDir> under same paths as symbolic links in
<sourceDir>: The objects in <backupDir> will be (unavoidably) removed to prevent
misleading situations in that for a symbolic link in <sourceDir> a different
type of object would exist in <backupDir> under same path.
If the objects in <backupDir> are symbolic links too, they will be either
synchronized (if the --syncSLinks option is given) or kept (and not changed).
Please see section Following Symbolic Links on when symbolic links are
reported as symbolic links by FIND.
[SCC_CONFL_04]
Corner case objects in <backupDir> under same paths as other objects (p/s/c/b/D)
in <sourceDir>: The objects in <backupDir> will be (unavoidably) removed except
when they are other objects (p/s/c/b/D) too, in which case they will be kept
(but not changed).
Other cases
-----------
[SCC_OTHER_01]
In some situations (e.g. Linux Samba + Linux Samba client),
cp --preserve=timestamps does not preserve modification timestamps (unless on
empty files). In that case, Zaloha should be instructed (via the --extraTouch
option) to use subsequent extra TOUCH commands instead, which is a more robust
solution. In the scripts for case of restore, extra TOUCH commands are used
unconditionally.
[SCC_OTHER_02]
Corner case if the Metadata directory is in its default location (= no option
--metaDir is given) and <sourceDir>/.Zaloha_metadata exists as well (which
may be the case in chained backups (= backups of backups)): It will be excluded.
If a backup of that directory is needed as well, it should be solved separately.
Hint: if the secondary backup starts one directory higher, then this exclusion
will not occur anymore.
Why be concerned about backups of the Metadata directory of the primary backup:
keep in mind that Zaloha synchronizes to <backupDir> only files and directories
and keeps other objects in metadata (and the restore scripts) only.
[SCC_OTHER_03]
It is possible (but not recommended) for <backupDir> to be a subdirectory of
<sourceDir> and vice versa. In such cases, FIND expressions to avoid recursive
copying must be passed in via <findGeneralOps>.
HOW ZALOHA WORKS INTERNALLY
Handling and checking of input parameters should be self-explanatory.
The actual program logic is embodied in AWK programs, which are contained in
Zaloha as "here documents".
The AWK program AWKPARSER parses the FIND operands assembled from
<findSourceOps> and <findGeneralOps> and constructs the FIND commands.
The outputs of running these FIND commands are tab-separated CSV metadata files
that contain all information needed for following steps. These CSV metadata
files, however, must first be processed by AWKCLEANER to handle (escape)
eventual tabs and newlines in filenames + perform other required preparations.
The cleaned CSV metadata files are then checked by AWKCHECKER for unexpected
deviations (in which case an error is thrown and the processing stops).
The next (optional) step is to detect hardlinks: the CSV metadata file from
<sourceDir> will be sorted by device numbers + inode numbers. This means that
multiply-linked files will be in adjacent records. The AWK program AWKHLINKS
evaluates this situation: The type of the first link will be kept as "file" (f),
the types of the other links will be changed to "hardlinks" (h).
Then comes the core function of Zaloha. The CSV metadata files from <sourceDir>
and <backupDir> will be united and sorted by file's paths and the Source/Backup
indicators. This means that objects existing in both directories will be in
adjacent records, with the <backupDir> record coming first. The AWK program
AWKDIFF evaluates this situation (as well as records from objects existing in
only one of the directories), and writes target state of synchronized
directories with actions to reach that target state.
The output of AWKDIFF is then sorted by file's paths in reverse order (so that
parent directories come after their children) and post-processed by AWKPOSTPROC.
AWKPOSTPROC modifies actions on parent directories of files to REV.NEW and
objects to KEEP only in <backupDir>.
The remaining code uses the produced data to perform actual work, and should be
self-explanatory.
An interactive JavaScript flowchart exists that explains the internal processing
within Zaloha in a graphical and intuitive manner.
Interactive JavaScript flowchart: https://fitus.github.io/flowchart.html
Understanding AWKDIFF is the key to understanding of whole Zaloha. An important
hint to AWKDIFF is that there can be five types of filesystem objects in
<sourceDir> and four types of filesystem objects in <backupDir>. At any given
path, each type in <sourceDir> can meet each type in <backupDir>, plus each
type can be standalone in either <sourceDir> or <backupDir>. Mathematically,
this results in ( 5 x 4 ) + 5 + 4 = 29 cases to be handled by AWKDIFF:
backupDir: d f l other (none)
---------------------------------------------------------------------------
sourceDir: directory d | 1 2 3 4 21
file f | 5 6 7 8 22
hardlink h | 9 10 11 12 23
symbolic link l | 13 14 15 16 24
other p/s/c/b/D | 17 18 19 20 25
(none) | 26 27 28 29
---------------------------------------------------------------------------
Note 1: Hardlinks (h) cannot occur in <backupDir>, because the type "h" is not
reported by FIND but determined by AWKHLINKS that can operate only on
<sourceDir>.
Note 2: Please see section Following Symbolic Links on when symbolic links
are reported as symbolic links by FIND.
The AWKDIFF code is commented on key places to make orientation easier.
A good case to begin with is case 6 (file in <sourceDir>, file in <backupDir>),
as this is the most important (and complex) case.
If you are a database developer, you can think of the CSV metadata files as
tables, and Zaloha as a program that operates on these tables: It fills them
with data obtained from the filesystems (via FIND), then processes the data
(defined sequence of sorts, sequential processings, unions and selects), then
converts the data to shellscripts, and finally executes the shellscripts
to apply the required changes back to the filesystems.
Among the operations which Zaloha performs, there is no operation which would
require the CSV metadata to fit as a whole into memory. This means that the size
of memory does not constrain Zaloha on how big "tasks" it can handle.
The critical operations from this perspective are the sorts. However,
GNU sort, for instance, is able to intelligently switch to an external
sort-merge algorithm, if it determines that the data is "too big",
thus mitigating this concern.
Side remark here: The theoretical time complexity of Zaloha is O(n * log n)
due to the sorts, but practically the runtime is dominated by the FIND scans.
Talking further in database developer's language: The data model of all CSV
metadata files is the same and is described in form of comments in AWKPARSER.
Files 310 and 320 do not qualify as tables, as their fields and records are
broken by eventual tabs and newlines in filenames. In files 330 through 370,
field 2 is the Source/Backup indicator. In files 380 through 555, field 2 is
the Action Code.
The natural primary key in files 330 through 360 is the file's path (column 14).
In files 370 through 505, the natural primary key is combined column 14 with
column 2. In files 510 through 555, the natural primary key is again
column 14 alone.
The combined primary key in file 505 is obvious e.g. in the case of other object
in <sourceDir> and other object in <backupDir>: File 505 then contains an
OK record for the former and a KEEP record for the latter, both with the
same file's path (column 14).
Data model as HTML table: https://fitus.github.io/data_model.html
TECHNIQUES USED BY ZALOHA TO HANDLE WEIRD CHARACTERS IN FILENAMES
Handling of "weird" characters in filenames was a special focus during
development of Zaloha. Actually, it was an exercise of how far can be gone with
a shellscript alone, without reverting to a C program. Tested were:
!"#$%&'()*+,-.:;<=>?@[\]^`{|}~, spaces, tabs, newlines, alert (bell) and
a few national characters (beyond ASCII 127). Please note that some filesystem
types and operating systems do not permit some of these weird characters at all.
Zaloha internally uses tab-separated CSV files, also tabs and newlines are major
disruptors. The solution is based on the following idea: POSIX (the most
"liberal" standard under which Zaloha must function) says that filenames may
contain all characters except slash (/, the directory separator) and ASCII NUL.
Hence, except these two, no character can be used as an escape character
(if we do not want to introduce some re-coding). Further, ASCII NUL is not
suitable, as it is widely used as a string delimiter. Then, let's have a look
at the directory separator itself: It cannot occur inside of filenames.
It separates file and directory names in the paths. As filenames cannot have
zero length, no two slashes can appear in sequence. The only exception is the
naming convention for network-mounted directories, which may contain two
consecutive slashes at the beginning. But three consecutive slashes
(a triplet ///) are impossible. Hence, it is a waterproof escape sequence.
This opens the way to represent a tab as ///t and a newline as ///n.
For display of filenames on terminal (and only there), control characters (other
than tabs and newlines) are displayed as ///c, to avoid terminal disruption.
(Such control characters are still original in the CSV metadata files).
Further, /// is used as first field in the CSV metadata files, to allow easy
separation of record lines from continuation lines caused by newlines in
filenames (it is impossible that continuation lines have /// as the first field,
because filenames cannot contain the newline + /// sequence).
Finally, /// are used as terminator fields in the CSV metadata files, to be able
to determine where the filenames end in a situation when they contain tabs and
newlines (it is impossible that filenames produce a field containing /// alone,
because filenames cannot contain the tab + /// sequence).
With these preparations, see how the AWKCLEANER works: For columns 14 and 16,
process CSV fields and records until a field containing /// is found. In such
special processing mode (in AWK code: fpr has value 1), every switch to a new
CSV field is a tab in the path, and every switch to a new record is a newline
in the path. AWKCLEANER assembles the fragments contained in the CSV fields
with the tabs (escaped as ///t) and newlines (escaped as ///n) to build the
resulting escaped paths that contain neither real tabs nor real newlines.
Zaloha checks that no input parameters contain ///, to avoid breaking of the
internal escape logic from the outside. The only exception are <findSourceOps>
and <findGeneralOps>, which may contain the ///d/ placeholder.
Additionally, the internal escape logic might be broken by target paths of
symbolic links: Unfortunately, the OSes do not normalize target paths with
consecutive slashes while writing them to the filesystems, and FIND does not
normalize them either in the -printf %l output. Actually, there seem to be no
constraints on the target paths of symbolic links. Hence, the /// triplets can
occur there as well. This prohibits their safe processing within the above
described FIND-AWKCLEANER algorithm. Instead, a special solution is implemented
that involves running an auxiliary script (205_read_slink.sh) for each symbolic
link that contains three or more consecutive slashes (found by FIND expression
-lname *///*). This script obtains the target paths of such symbolic links and
escapes slashes by ///s, tabs by ///t and newlines by ///n. The escaped target
paths are then put into extra records in files 310 and 320, and AWKCLEANER
merges them into the regular records (column 16) in the cleaned files 330
and 340. Performance-wise, running the auxiliary script 205 per symbolic link
is not ideal, but the above described symbolic links should be rare occurrences.
An additional challenge is passing of variable values to AWK. During its
lexical parsing, AWK interprets backslash-led escape sequences. To avoid this,
backslashes are converted to ///b in the BASH script, and ///b are converted
back to backslashes in the AWK programs.
In the shellscripts produced by Zaloha, single quoting is used, hence single
quotes are disruptors. As a solution, the '"'"' quoting technique is used.
The SORT commands are invoked under the LC_ALL=C environment variable, to avoid
problems caused by some locales that ignore slashes and other punctuations
during sorting.
In the CSV metadata files 330 through 500 (i.e. those which undergo the sorts),
file's paths (field 14) have directory separators (/) appended and all
directory separators then converted to ///s. This is to ensure correct sort
ordering. Imagine the ordering bugs that would happen otherwise:
Case 1: given dir and dir!, they would be sort ordered:
dir, dir!, dir!/subdir, dir/subdir.
Case 2: given dir and dir<tab>ectory, they would be sort ordered:
dir/!subdir1, dir///tectory, dir/subdir2.
Zaloha does not contain any explicit handling of national characters in
filenames (= characters beyond ASCII 127). It is assumed that the commands used
by Zaloha handle them transparently (which should be tested on environments
where this topic is relevant). <sourceDir> and <backupDir> must use the same
code page for national characters in filenames, because Zaloha does not contain
any code page conversions.
ADVANCED USE OF ZALOHA - REMOTE SOURCE AND REMOTE BACKUP MODES
Remote Source Mode
------------------
In the Remote Source Mode, <sourceDir> is on a remote source host that can be
reached via SSH/SCP, and <backupDir> is available locally. This mode is
activated by the --sourceUserHost option.
The FIND scan of <sourceDir> is run on the remote side in an SSH session, the
FIND scan of <backupDir> runs locally. The subsequent sorts + AWK processing
steps occur locally. The Exec1/2/3/4/5 steps are then executed as follows:
Exec1: The shellscript 610 is executed locally.
Exec2: All three shellscripts 621, 622 and 623 are executed locally. The script
622 contains SCP commands instead of CP commands.
Exec3: The shellscript 631 contains pre-copy actions and is run on the remote
side "in one batch". The shellscript 632 contains the individual SCP commands
to be executed locally. The shellscript 633 contains post-copy actions and
is run on the remote side "in one batch".
Exec4 (shellscript 640): same as Exec1
Exec5 (shellscripts 651, 652 and 653): same as Exec2
Remote Backup Mode
------------------
In the Remote Backup Mode, <sourceDir> is available locally, and <backupDir> is
on a remote backup host that can be reached via SSH/SCP. This mode is activated
by the --backupUserHost option.
The FIND scan of <sourceDir> runs locally, the FIND scan of <backupDir> is run
on the remote side in an SSH session. The subsequent sorts + AWK processing
steps occur locally. The Exec1/2/3/4/5 steps are then executed as follows:
Exec1: The shellscript 610 is run on the remote side "in one batch", because it
contains only RMDIR and REMOVE actions to be executed on <backupDir>.
Exec2: The shellscript 621 contains pre-copy actions and is run on the remote
side "in one batch". The shellscript 622 contains the individual SCP commands
to be executed locally. The shellscript 623 contains post-copy actions and
is run on the remote side "in one batch".
Exec3: All three shellscripts 631, 632 and 633 are executed locally. The script
632 contains SCP commands instead of CP commands.
Exec4 (shellscript 640): same as Exec1
Exec5 (shellscripts 651, 652 and 653): same as Exec2
Note
----
Running multiple actions on the remote side via SSH "in one batch" has
positive performance effects on networks with high latency, compared with
running individual commands via SSH individually (which would require a network
round-trip for each individual command).
SSH connection
--------------
For all SSH/SCP-related setups, read the SSH/SCP documentation first.
It is recommended to use SSH connection multiplexing, where a master connection
is established before invoking Zaloha. The subsequent SSH and SCP commands
invoked by Zaloha then connect to it, thus avoiding repeated overheads of
establishing new connections. This also removes the need for repeated entering
of passwords, which is necessary if no other authentication method is used,
e.g. the SSH Public Key authentication.
The SSH master connection is typically created as follows:
ssh -nNf -o ControlMaster=yes \
-o ControlPath='~/.ssh/cm-%r@%h:%p' \
<remoteUserHost>
To instruct the SSH and SCP commands invoked by Zaloha to use the SSH master
connection, use the options --sshOptions and --scpOptions:--sshOptions='-o ControlMaster=no -o ControlPath=~/.ssh/cm-%r@%h:%p'
--scpOptions='-o ControlMaster=no -o ControlPath=~/.ssh/cm-%r@%h:%p'
After use, the SSH master connection should be terminated as follows:
ssh -O exit -o ControlPath='~/.ssh/cm-%r@%h:%p' <remoteUserHost>
SCP Progress Meter
------------------
SCP contains a Progress Meter that is useful when copying large files.
It continuously displays the percent of transfer done, the amount transferred,
the bandwidth usage and the estimated time of arrival.
In Zaloha, the SCP Progress Meters appear both in the analysis phase
(copying of metadata files to/from the remote side) as well as in the
execution phase (executions of the scripts 622, 632 and 652).
In the analysis phase, the display of the SCP Progress Meters (along with all
other analysis messages) can be switched off by the --noProgress option.
Internally, this translates to the "-q" option for the respective SCP commands.
In the execution phase, the display of the SCP Progress Meters can be switched
off via the option --scpExecOpt (= override <scpOptions> by SCP options with
"-q" added).
Technical note: SCP never displays its Progress Meter if it detects that its
standard output is not connected to a terminal. To support the SCP Progress
Meters in the execution phase, Zaloha does an I/O redirection which pipes the
shell traces through the AWK filter 102 but keeps the standard output of the
copy scripts connected to its own standard output.
Windows / Cygwin notes:
-----------------------
Make sure you use the Cygwin's version of OpenSSH, not the Windows' version.
As of OpenSSH_8.3p1, the SSH connection multiplexing on Cygwin (still) doesn't
seem to work, not even in the Proxy Multiplexing mode (-O proxy).
To avoid repeated entering of passwords, use the SSH Public Key authentication.
Other SSH/SCP-related remarks:
------------------------------
If the path of the remote <sourceDir> or <backupDir> is given relative, then it
is relative to the SSH login directory of the user on the remote host.
To use a different port, use also the options --sshOptions and --scpOptions
to pass the options "-p <port>" to SSH and "-P <port>" to SCP.
The SCP commands that copy from remote to local may require the "-T" option
to disable the (broken?) SCP-internal check that results in false findings like
"filename does not match request" or "invalid brace pattern". Use --scpOptions
to pass the "-T" option to SCP.
The individual option words in <sshOptions> and <scpOptions> are separated by
spaces. Neither SSH nor SCP allows/requires words in their command-line options
that would themselves contain spaces or metacharacters that would undergo
additional shell expansions, also Zaloha does not contain any sophisticated
handling of <sshOptions> and <scpOptions>.
The option --scpExecOpt can be used to override <scpOptions> specially for
the SCP commands used during the execution phase. If the option --scpExecOpt
is not given, <scpOptions> applies to all SCP commands (= to those used in the
analysis phase as well as to those used in the execution phase).
Zaloha does not use the "-p" option of scp to preserve times of files, because
this option has a side effect (that is not always wanted) of preserving the
modes too. Explicit TOUCH commands in the post-copy scripts are used instead.
They preserve the modification times (only).
Eventual "at" signs (@) and colons (:) contained in directory names should not
cause misinterpretations as users and hosts by SCP, because Zaloha prepends
relative paths by "./" and SCP does not interpret "at" signs (@) and colons (:)
after first slash in file/directory names.
ADVANCED USE OF ZALOHA - COMPARING CONTENTS OF FILES
First, let's make it clear that comparing contents of files will increase the
runtime dramatically, because instead of reading just the directory data,
the files themselves must be read.
ALTERNATIVE 1: option --byteByByte (suitable if both filesystems are local)
Option --byteByByte forces Zaloha to compare "byte by byte" files that appear
identical (more precisely, files for which either "no action" (OK) or just
"update of attributes" (ATTR) has been prepared). If additional updates of files
result from this comparison, they will be executed in step Exec5.
ALTERNATIVE 2: option --sha256 (compare contents of files via SHA-256 hashes)
There is an almost 100% security that files are identical if they have equal
sizes and SHA-256 hashes. The --sha256 option instructs Zaloha to prepare
FIND expressions that, besides collecting the usual metadata via the -printf
operand, cause SHA256SUM to be invoked on each file to calculate the SHA-256
hash. These calculated hashes are contained in extra records in files 310 and
320, and AWKCLEANER merges them into the regular records in the cleaned files
330 and 340 (the SHA-256 hashes go into column 13).
If additional updates of files result from comparisons of SHA-256 hashes,
they will be executed in step Exec5 (same principle as for the --byteByByte
option).
Additionally, Zaloha handles situations where the files have identical sizes
and SHA-256 hashes, but different modification times: it then prevents copying
of such files and only aligns their modification times (ATTR:T).
The --sha256 option has been developed for the Remote Modes, where the files
to be compared reside on different hosts: The SHA-256 hashes are calculated
on the respective hosts and for the comparisons of file contents, just the
hashes are transferred over the network, not the files themselves.
The --sha256 option is not limited to the Remote Modes - it can be used in
the Local Mode too. Having CSV metadata that contains the SHA-256 hashes may
be useful for other purposes as well, e.g. for de-duplication of files by
content in the source directory: By sorting the CSV file 330 by the SHA-256
hashes (column 13) one obtains a CSV file where the files with identical
contents are located in adjacent records.
ADVANCED USE OF ZALOHA - COPYING FILES IN PARALLEL
First, let's clarify when parallel operations do not make sense: When copying
files locally, even one single process will probably fully utilize the available
bus capacity. In such cases, copying files in parallel does not make sense.
On the contrary, imagine what happens when a process copies a small file over
a network with high latency: sending out the small file takes microseconds,
but waiting for the network round-trip to finish takes milliseconds. Also, the
process is idle most of the time, and the network capacity is under-utilized.
In such cases, also typically when many small files are copied over a network,
running the copy commands in parallel will speed up the process significantly.
Zaloha provides support for parallel operations of up to 8 parallel processes
(constant MAXPARALLEL). How to utilize this support:
Let's take the script 622_exec2_copy.sh as an example: Make 8 copies of the
script. In the header of the first copy, keep only CP1, TOUCH1 (or SCP1)
assigned to real commands, and assign all other "command variables" to the empty
command (shell builtin ":"). Adjust the other copies accordingly. This way,
each of the 8 copies will process only its own portion of files, so they can be
run in parallel.
These manipulations should, of course, be automated by a wrapper script: The
wrapper script should invoke Zaloha with the --noExec and --no622Hdr
options, also Zaloha prepares the 622 script without header (i.e. body only).
The wrapper script should prepare the 8 different headers and use them
with the header-less 622 script (of which only one copy is needed then).
ADVANCED USE OF ZALOHA - TECHNICAL INTEGRATION OPTIONS
Zaloha contains several options to make technical integrations easy. In the
extreme case, Zaloha can be used as a mere "difference engine" which takes
the FIND data from <sourceDir> and/or <backupDir> as inputs and produces the
CSV metadata and the Exec1/2/3/4/5 scripts as outputs.
First useful option is --noDirChecks: This switches off the checks for
existence of <sourceDir> and <backupDir>.
In Local Mode, if <backupDir> is not available locally, it is necessary to use
the --metaDir option to place the Zaloha metadata directory to a location
accessible to Zaloha.
Next useful options are --noFindSource and/or --noFindBackup: They instruct
Zaloha to not run FIND on <sourceDir> and/or <backupDir>, but use externally
supplied CSV metadata files 310 and/or 320 instead. This means that these files
must be produced externally and downloaded to the Zaloha metadata directory
before invoking Zaloha. These files must, of course, have the same names and
contents as the CSV metadata files that would otherwise be produced by the
scripts 210 and/or 220.
The --noFindSource and/or --noFindBackup options are also useful when
network-mounted directories are available locally, but running FIND on them is
slow. Running the FINDs directly on the respective file servers in SSH sessions
should be much quicker.
The --noExec option can be used to prevent execution of the Exec1/2/3/4/5
scripts by Zaloha itself.
Last set of useful options are --no610Hdr through --no653Hdr. They instruct
Zaloha to produce header-less Exec1/2/3/4/5 scripts (i.e. bodies only).
The headers normally contain definitions used in the bodies of the scripts.
Header-less scripts can be easily used with alternative headers that contain
different definitions. This gives much flexibility:
The "command variables" can be assigned to different commands or own shell
functions. The "directory variables" sourceDir and backupDir can be re-assigned
as needed, e.g. to empty strings (which will cause the paths passed to the
commands to be not prefixed by <sourceDir> and <backupDir>).
CYBER SECURITY TOPICS
Standard security practices should be followed on environments exposed to
potential attackers: Potential attackers should not be allowed to modify the
command line that invokes Zaloha, the PATH variable, BASH init scripts or other
items that may influence how Zaloha works and invokes operating system commands.
Further, the following security threats arise from backup of a directory that is
writable by a potential attacker:
Backup media overflow attack via hardlinks
------------------------------------------
The attacker might hard-link a huge file many times, hoping that the backup
program writes each link as a physical copy to the backup media ...
Mitigation with Zaloha: Perform hardlink detection (use the --detectHLinksS
option)
Backup media overflow attack via symbolic links
-----------------------------------------------
The attacker might create many symbolic links pointing to directories with huge
contents (or to huge files), hoping that the backup program writes the contents
pointed to by each such link as a physical copy to the backup media ...
Mitigation with Zaloha: Do not follow symbolic links on <sourceDir> (do not use
the --followSLinksS option)
Unauthorized access via symbolic links
--------------------------------------
The attacker might create symbolic links to locations to which he has no access,
hoping that within the restore process (which he might explicitly request for
this purpose) the linked contents will be restored to his home directory ...
Mitigation with Zaloha: Do not follow symbolic links on <sourceDir> (do not use
the --followSLinksS option)
Privilege escalation attacks
----------------------------
The attacker might create a rogue executable program in his home directory with
the SetUID and/or SetGID bits set, hoping that within the backup process (or
within the restore process, which he might explicitly request for this purpose),
the user/group ownership of his rogue program changes to a user/group with
higher privileges (ideally root), the SetUID and/or SetGID bits will be restored
and he will have access to this program ...
Mitigation with Zaloha: Prevent this scenario. Be specially careful with options
--pMode and --pRevMode and with the restore script
860_restore_mode.sh
Attack on Zaloha metadata
-------------------------
The attacker might manipulate files in the Metadata directory of Zaloha, or in
the Temporary Metadata directory of Zaloha, while Zaloha runs ...
Mitigation with Zaloha: Make sure that the files in the Metadata directories
are not writeable/executable by other users (set up correct umasks, review
ownerships and modes of files that already exist).
Shell code injection attacks
----------------------------
The attacker might create a file in his home directory with a name that is
actually a rogue shell code (e.g. '; rm -Rf ..'), hoping that the shell code
will, due to some program flaw, be executed by a user with higher privileges ...
Mitigation with Zaloha: Currently not aware of such vulnerability within Zaloha.
If found, please open a high priority issue on GitHub.