-
Notifications
You must be signed in to change notification settings - Fork 0
Home
AID: Adaptive Impact Driven Detection library for corruption detection
AID provides a way for HPC users of dynamic simulations over multiple time steps to detect corruptions that impact the results of their execution.
AID is designed to monitor the state data of the application: variables that are the outcome of the execution.
AID is a library offering functions to help programmers defining which variable should be monitored.
AID offers only detection. AID could be used in combination with any other recovery library, such as FTI (Fault Tolerance Interface: https://github.com/leobago/fti).
AID is simple to use:
There are only four steps for users to annotate their MPI application codes:
(1) initialize the detector by calling SDC_Init();
(2) specify the key variables to protect by calling SDC_Protect(var,ierr);
(3) annotate the execution iterations by inserting SDC_Snapshot() into the key loop;
(4) release the memory by calling SDC_Finalize() in the end.
AID supports both C and Fortran.