-
Notifications
You must be signed in to change notification settings - Fork 5
Architecture
We have modeled the communication of the TracEX project in the following FMC diagram.
The main components in the communication are:
- User: Interacts with the system through a web browser.
- Django Frontend: Serves the user interface.
- Input: Collects user inputs as patient journeys and filters.
- Patient Journey Generator: Generates artificial patient journey data.
- Orchestrator: Central communication interface; coordinates requests and responses between modules and the front end.
- SQLite Database: Stores intermediate and final results, allowing for data retrieval and persistence.
- Modules: Perform specific tasks in the data processing pipeline. Details about the modules will be provided in the following section.
- OpenAI API: Provides an LLM interface for use in the modules.
The following class diagram demonstrates the relationship between the Orchestrator
class and the modules.
The abstract base class of the modules provides a unified interface for all pipeline steps. By implementing the execute
function in various ways, the orchestrator can call all modules without knowing anything about their implementation. The ExtractionConfiguration
class accomplishes the reference between the modules and the orchestrator.
Each instance of this class represents one pipeline execution and contains various important attributes for that execution. These attributes are the patient_journey
, the event_types
and locations
to be extracted, the modules
to be executed, and the activity _key
that specifies what groups will be illustrated in the resulting directly-follows graph.
We have chosen this architecture to make the pipeline steps more modular and improve maintainability and extensibility.
- Preprocessor: Cleans and prepares data.
- Activity Labeler: Identifies and labels activities.
- Time Extractor: Extracts timestamps.
- Event Type Classifier: Classifies event types.
- Location Extractor: Extracts location data.
- Metrics Analyzer: Analyzes metrics.
- Cohort Tagger: Extracts patient information.
To add a new module, please follow these steps:
- Create a new Python file in the
modules/
folder and follow the naming convention ofmodule_<your_name>.py
. - Import the abstract base class for modules with
from extraction.logic.module import Module
and let your own module class inherit from it. Please follow the naming convention of using a subject in CamelCase e.g.ActivityLabeler
. - Define a constructor for your class that sets the
name
anddescription
class variables. - Implement the
execute
function from the abstract base class. - Import your new module class in the
orchestrator.py
file. - Add your module to the dictionary
modules
, which is a variable of theExtractionConfiguration
class. Be aware that the order of this dictionary determines the order of execution of the modules.
Congratulations, you have added your own module!
We have modeled the database schema of the TracEX project in the following class diagram.
The main components of the database schema are:
-
Patient Journey
: It has a uniquename
and a field calledpatient_journey
that stores the content of a patient journey in string representation. It has a foreign key relationship to a variable number of traces that allows for extracting multiple traces from one journey and comparing them. -
Trace
: Is a collection ofEvent
objects. It has a foreign key relationship to a variable number of events and to exactly one patient journey and one cohort. The side effect of referencing one trace with one cohort is that one trace resembles one pipeline execution, as the cohort information get extracted during the execution. -
Event
: Is one row in the resulting trace. It has attributes such asactivity
,event_type
,start
,end
andduration
time information and alocation
where the event occurred. These attributes get extracted from the patient journey by the respective modules. -
Cohort
: Stores information about one patient from one pipeline execution. It has the attributesage
,sex
,origin
,condition
andpreexisting_condition
. -
Prompt
: Stores prompts in the database which are used for the different execution steps of the pipeline inside the modules. This allows the user to change the prompts without looking into the source code.
The database can be accessed through the Django admin view.