Skip to content

Architecture

Pit Buttchereit edited this page May 25, 2024 · 18 revisions

Components Communication

We have modeled the communication of the TracEX project in the following FMC diagram.

TracEX FMC

The main components in the communication are:

  • User: Interacts with the system through a web browser.
  • Django Frontend: Serves the user interface.
  • Input: Collects user inputs as patient journeys and filters.
  • Patient Journey Generator: Generates artificial patient journey data.
  • Orchestrator: Central communication interface; coordinates requests and responses between modules and the front end.
  • SQLite Database: Stores intermediate and final results, allowing for data retrieval and persistence.
  • Modules: Perform specific tasks in the data processing pipeline. Details about the modules will be provided in the following section.
  • OpenAI API: Provides an LLM interface for use in the modules.

Modules

The following class diagram demonstrates the relationship between the Orchestrator class and the modules.

TracEX Modules Class Diagram

The abstract base class of the modules provides a unified interface for all pipeline steps. By implementing the execute function in various ways, the orchestrator can call all modules without knowing anything about their implementation. The ExtractionConfiguration class accomplishes the reference between the modules and the orchestrator.

Each instance of this class represents one pipeline execution and contains various important attributes for that execution. These attributes are the patient_journey, the event_types and locations to be extracted, the modules to be executed, and the activity _key that specifies what groups will be illustrated in the resulting directly-follows graph.

We have chosen this architecture to make the pipeline steps more modular and improve maintainability and extensibility.

Current Modules

  • Preprocessor: Cleans and prepares data.
  • Activity Labeler: Identifies and labels activities.
  • Time Extractor: Extracts timestamps.
  • Event Type Classifier: Classifies event types.
  • Location Extractor: Extracts location data.
  • Metrics Analyzer: Analyzes metrics.
  • Cohort Tagger: Extracts patient information.

How to add new modules

To add a new module, please follow these steps:

  1. Create a new Python file in the modules/ folder and follow the naming convention of module_<your_name>.py.
  2. Import the abstract base class for modules with from extraction.logic.module import Module and let your own module class inherit from it. Please follow the naming convention of using a subject in CamelCase e.g. ActivityLabeler.
  3. Define a constructor for your class that sets the name and description class variables.
  4. Implement the execute function from the abstract base class.
  5. Import your new module class in the orchestrator.py file.
  6. Add your module to the dictionary modules, which is a variable of the ExtractionConfiguration class. Be aware that the order of this dictionary determines the order of execution of the modules.

Congratulations, you have added your own module!

Database Scheme

We have modeled the database schema of the TracEX project in the following class diagram.

TracEX Database Schema

The main components of the database schema are:

  • Patient Journey: It has a unique name and a field called patient_journey that stores the content of a patient journey in string representation. It has a foreign key relationship to a variable number of traces that allows for extracting multiple traces from one journey and comparing them.
  • Trace: Is a collection of Event objects. It has a foreign key relationship to a variable number of events and to exactly one patient journey and one cohort. The side effect of referencing one trace with one cohort is that one trace resembles one pipeline execution, as the cohort information get extracted during the execution.
  • Event: Is one row in the resulting trace. It has attributes such as activity, event_type, start, end and duration time information and a location where the event occurred. These attributes get extracted from the patient journey by the respective modules.
  • Cohort: Stores information about one patient from one pipeline execution. It has the attributes age, sex, origin, condition and preexisting_condition.
  • Prompt: Stores prompts in the database which are used for the different execution steps of the pipeline inside the modules. This allows the user to change the prompts without looking into the source code.

The database can be accessed through the Django admin view.

Clone this wiki locally