In bionode-watermill we use uids, that are basically hashes. These uids
are used to control if a task is already run or not. For instance, imagine
you have ran part of your pipeline before and it failed somewhere in the
middle. When you attempt to run your pipeline.js
script again,
bionode-watermill will know that some of your tasks have already run and
will not run them again. Instead it will be able to crawl into the ./data
folder too look for the inputs required for the tasks that still need to be
run.
Uid at the task level is generate in two different ways:
-
First, it generated given the
props
(input
,output
andparams
) passed by the user to the task definition (this is generated underlib/reducers/task.js
). Let's call thisuid
the task defaultuid
(before running). -
Second, if we want a task to be ran twice in the same pipeline it cannot have the same
uid
otherwise bionode-watermill will not allow the second execution of the pipeline. However, it can properly solve this because there is a second level ofuid
generation while the task is running. Therefore, a taskuid
is modified on running to get its final or runuid
. This newuid
is generated taking into account task defaultuid
and its parent tasksuids
and making a unique hash of all theseuids
. This renders that the same task can be ran twice in the pipeline if theirtrajectory
is different, i.e., if they have different parent tasks.
Orchestrators also have uids that are basically an hash of the uids
of the
tasks contained in them. These uids
are used in lib/orchestrators/join.js
to
control the uid
generation for fork
because each downstream task after a
fork
must be multiplied as many times as the tasks inside fork (for further
details on this see Forkception).