-
Notifications
You must be signed in to change notification settings - Fork 0
Home
NFDI4Health (National Research Data Infrastructure for personal health data), T3.7 “Distributed data analysis infrastructure”
DFG project number: 442326535
Authors: Sofia Maria Siampani, Carolina Schwedhelm, Katharina Nimptsch, Tobias Pischon
Contact person: Sofia Maria Siampani ([email protected])
Affiliation: Molecular Epidemiology Research Group, Max Delbrück Centrum for Molekulare Medizin, Berlin, Germany ([email protected])
Version: 3.0
Last updated: 27.11.2024
This is the Standard Operating Procedure (SOP) for Installing and Configuring Opal/DataSHIELD for the NFDI4Health consortium. It is based on an earlier version that was developed within the ENPADASI consortium by DIfE.
Opal is a data management application that is integrated with R. It is implemented with DataSHIELD (a series of R packages) which allows advanced statistical data analysis across multiple studies without sharing and disclosing any individual-level data.
If needed, you can find the official Opal documentation here: https://opaldoc.obiba.org/en/latest/. For troubleshooting and addressing software issues you can go here: https://groups.google.com/g/obiba-users or https://github.com/obiba/opal/issues.
This SOP is for internal use within NFDI4Health and will get updated with the progression of NFDI4Health. If you have any feedback for the SOP or require support, please contact me at [email protected].
OBiBa software are open source and made available under the GPL3 licence. OBiBa software are free of charge.
The work presented herein was made possible using the OBiBa suite (www.obiba.org), a software suite developed by Maelstrom Research (www.maelstrom-research.org) and Epigeny (www.epigeny.io)
The following server hardware requirements need to be met:
Recommended Operating system: RedHat/CentOS
- Recent server-grade or high-end consumer-grade processor
- 8GB or more disk space
- 4GB or more RAM
Since every institute has their own hardware preferences/requirements, we present the following options based on the official Opal documentation^2:
- Native installation
- Debian package installation (Linux based operating systems) or
- RPM package installation (Red Hat Linux based operating systems)
- Installation using Docker
Both options are fine for the Federated Analysis to take place but we recommend that you choose the option that best meets your institute’s security needs.
The person who is setting up Opal/DataSHIELD needs to have administrative rights on the server(s).
- Install Java 21 if it’s not already installed. For installing JDK 21 for your operating system, reference: https://www.oracle.com/java/technologies/downloads/?er=221886#java21.
- Install and configure the database^3 : Opal uses a database for storing data. Currently the supported SQL database engines are MySQL, MariaDB and PostgreSQL. Here we are suggesting MySQL >= 5.5.x but the other supported databases also work.
- Reference for installing MySQL based on your operating system: https://dev.mysql.com/downloads/mysql/.
- For Post-installation Set up and Testing follow this: https://dev.mysql.com/doc/mysql-installation-excerpt/5.7/en/default-privileges.html
- Create the databases: A fully operational Opal server requires to have at least two different databases registered for identifiers mapping storage and data storage (at least one is required).
- Log into the database shell with the root password you set before:
sudo mysql -u root -p
- After successful login you will see the database prompt and can continue the set up.
mysql>
- Write these commands one by one in the database prompt in order to create the databases and user required by Opal. Once done, you can quit the cell.
CREATE DATABASE opal_data; CREATE DATABASE opal_ids; CREATE USER 'opal'@'localhost' IDENTIFIED BY '<opal-user-password>'; GRANT ALL ON opal_data.* TO 'opal'@'localhost'; GRANT ALL ON opal_ids.* TO 'opal'@'localhost'; FLUSH PRIVILEGES; QUIT;
- Log into the database shell with the root password you set before:
- Install R >= 4.x: Reference this for installing R and recommended packages (r-base, r-dev) based on your operating system: https://cran.r-project.org/.
- Install Opal and Rock^4. For Rock, both Java and R must be installed on the same host.
- For Debian: https://www.obiba.org/pages/pkg/
- For RPM package installation: https://www.obiba.org/pages/rpm/
During this step you will be asked to set a strong password, which you will need later to login as administrator into Opal’s web interface. The password must contain at least 8 characters, with at least one digit, one upper case alphabet, one lower case alphabet, one special character (which includes @#$%^&+=!) and no white space.
- Edit this file: OPAL_HOME/conf/opal-config.properties to match your server needs. (https://opaldoc.obiba.org/en/latest/admin/configuration.html).
- Set up Reverse Proxy Configuration (e.g. Apache^5) and replace Opal’s default certificate with a valid one in order to match your institute’s security needs.
- Now Opal is ready for the first login. The Opal’s administrative web interface can be accessed with any modern web browser by addressing:
https://<host>:8843
- The first site you will see is the login page. Sign in with ‘administrator’ as username and the Opal password you set previously.
- Go to “Administration” > “Databases”. In the "Identifiers Database" section, click on "Register" and select the database type that you set up (“SQL” or “MongoDB”). Input information for the database: provide the name of the database along with connection information (URL, username, and password) and save the settings. Do the same for the “Data Databases”. Upon successful configuration a notification in blue box pop ups saying “Connection successful”.
- Go to “Adminstration”>”DataSHIELD”. Click on “Add Package”, choose the option ‘Install all DataSHIELD packages’ and click ‘Install’.The DataSHIELD packages should now appear in the list and the “Methods” section should be populated with entries. The initial set up is now complete.
If you installed Opal via the Debian package, you may update it using the command:
apt-get install opal
If you installed Opal via the RPM package, you may update it using the command:
yum install opal-server
OBiBa is an early adopter of the Docker technology, providing its own images from the Docker Hub repository. The Docker platform runs natively on Linux (on x86-64, ARM and many other CPU architectures) and on Windows (x86-64). Before proceeding with this option, make sure that this technology matches your institute’s security needs^6.
- Install Docker^7 and create an account.
- Open a text editor (e.g. Notepad) on your machine and paste the following:
version: '3'
services:
opal:
image: obiba/opal:latest
ports:
- "8843:8443"
- "8880:8080"
links:
- mysqldata
- mysqlids
- rock
environment:
- JAVA_OPTS=-Xms1G -Xmx8G -XX:+UseG1GC
- OPAL_ADMINISTRATOR_PASSWORD=password
- MYSQLDATA_HOST=mysqldata
- MYSQLDATA_USER=opal
- MYSQLDATA_PASSWORD=password
- MYSQLIDS_HOST=mysqlids
- MYSQLIDS_USER=opal
- MYSQLIDS_PASSWORD=password
- ROCK_HOSTS=rock:8085
- ROCK_ADMINISTRATOR_USER=administrator
- ROCK_ADMINISTRATOR_PASSWORD=password
volumes:
- ./opal:/srv
mysqldata:
image: mysql:5
environment:
- MYSQL_DATABASE=opal
- MYSQL_ROOT_PASSWORD=password
- MYSQL_USER=opal
- MYSQL_PASSWORD=password
volumes:
- ./mysqldata:/var/lib/mysql
mysqlids:
image: mysql:5
environment:
- MYSQL_DATABASE=opal_ids
- MYSQL_ROOT_PASSWORD=password
- MYSQL_USER=opal
- MYSQL_PASSWORD=password
volumes:
- ./mysqlids:/var/lib/mysql
rock:
image: datashield/rock-base:latest
environment:
- ROCK_ADMINISTRATOR_NAME=administrator
- ROCK_ADMINISTRATOR_PASSWORD=password
- Reference this https://opaldoc.obiba.org/en/latest/admin/installation.html#docker-image-installation to check all environment variables. Adjust the values, making sure, you are using strong passwords. The password must contain at least 8 characters, with at least one digit, one upper case alphabet, one lower case alphabet, one special character (which includes @#$%^&+=!) and no white space. Also, make sure that the ROCK_ADMINISTRATOR_PASSWORD matches in both instances.
- Save the file as “docker-compose.yml”.
- Open the command line in your machine and run the following command in the directory where the docker-compose.yml file is stored.
docker-compose -f docker-compose.yml up -d
This will create and configure the Opal server, the MySQL databases, a DataSHIELD ready R server and all useful plugins. 6) Set up Reverse Proxy Configuration (e.g. Apache^5) and replace Opal’s default certificate with a valid one in order to match your institute’s security needs. 7) Now Opal is ready for the first login. The Opal’s administrative web interface can be accessed with any modern web browser by addressing:
https://<host>:8843
- The first site you will see is the login page. Sign in with ‘administrator’ as username and the Opal password you set previously.
- Go to Go to “Administration” > “Databases” and confirm that the databases are successfully configured and connected.
- Go to “Adminstration”>”DataSHIELD” and confirm that “dsBase” is installed.
The initial set up is now complete.
Change the docker image version and restart the docker container. If the opal-home directory was mounted in user space, it will be reused.
After the harmonization process, you will end up having two files:
- a data dictionary in an xls/xlsx format
- a harmonized dataset in a CSV format
These two files need to be uploaded before they can be imported. The files need to be accessible from the Opal server. For that purpose, Opal has a "file system" where the data files can be uploaded.
From the "Dashboard" page, click on "Manage Files": Navigate to the directory where the data file will be uploaded. Click on the "Upload" button; this will open a "File Upload" window which allows you to select a file to upload from your computer. Click on "Choose File" and select the data dictionary file (-dictionary.xlsx). Once done, click on the "Upload" button. Do the same with the data source file (.csv).
In Opal, a project is a workspace for managing data. It is required to create a project before importing data into Opal.
- Go to the “Projects” page.
- Click on “Add Project” button.
- In the “Add Project” popup window:
- Enter the name of the project “NFDI4Health”
- Select name of database created by the installation for default storage.
- Optionally enter the project title and description
- “Save” the project.
-
Go to this project’s table section.
-
Click on “Add Table” and select “Add/update” from dictionary…”.
-
In the “Add/Update Tables” popup window:
-
Click on “Browse”. In the “File Selector” pop-up window:
-
Go to the projects section and click on the project name.
-
Choose the previously uploaded dictionary file *-dictionary.xlsx by ticking the checkbox to the left of the file name and click on “Select”.
-
Click on “Next”.
-
-
In the window choose the dataset file (in csv format) and click on “Finish”.
The imported table should appear now in the project’s table section.
-
Click on “Import” (the second one in the button group on the right) in tables section.
-
In the “Import Data” pop-up window:
-
Select CSV data format and click on “Next”.
-
Click on “Browse”.
-
In the “File Selector” popup window:
- Go to the projects section and click on the project name.
- Choose the previously uploaded CSV data file by ticking the checkbox to the left of the file name and click on “Select”.
-
Enter the name of the table (as described in the data dictionary) in destination table and click on “Next”.
-
Click on “Next “in the next window with default settings.
-
Tick the checkbox to the left of the test table and click on “Next”.
-
Click on “Finish”.
-
-
Data importation is completed. You can see the data by clicking on the table test in the table section of the projects page.
Once your harmonized dataset is imported, the researchers have to be given permissions for performing federated analysis.
- Open Opal web interface and login as administrator,
- Go to the “Administrator” page, and click on “Users and Groups” section,
- Click “Add User” and select “Add user with password”.
- Enter a login name, the password and optionally a group name the user will be a member of (if the group does not exist a new one will be created), and save. The password must be at least 8 characters long, including one digit, one uppercase letter, one lowercase letter, one special character (e.g., @#$%^&+=!), and no white space.
To add more users, repeat the steps above.
The users have to get access to functions provided by DataSHIELD. Take the following steps to set it up:
- Go back to the “Administrator” page and click on “DataSHIELD” section,
- Click “Add Permission” in the “Permissions” section, and select "Add user permission"/“Add group permission”, depending on your setup earlier.
- Type the name of the user/group you defined,
- Leave the default selection of “Use” and save.
Now all the members of the group are allowed to execute the provided methods. The access can be revoked at any time.
To make data visible for analyses, access privileges for the table have to be configured:
- Go to the project page, “Tables” section,
- Go to the tables page,
- On the “Permissions” tab, click “Add” and select "Add user permission"/“Add group permission”,
- Enter the name of the use/group,
- Leave the default permission of “View dictionary and summaries”, and submit.
After saving the settings, all the group users get the ability to run DataSHIELD Methods on the table data.