These instructions assume you use MacOS, and that you are on the internal Broad network or the VPN. If the VPN is not installed, follow the instructions at this link.
During this process, you will need your GitHub and Docker Hub username, password, and personal access token for multiple steps, so make sure to have those handy. If you don't have those yet, see the section below, otherwise you can skip to Request Required Access.
GitHub is where the Broad stores our code and projects. Docker Hub allows the development team to easily deploy software without having to install lots of dependencies.
Sign up to these services with your personal email:
Create a personal access token so you can interact with GitHub on the command line.
Ensure that you have access to the required team resources. If you encounter a permission error, it is likely because you are missing appropriate access.
- DataBiosphere: Join the
#github
Slack channel, click the lightning bolt in the channel header, and selectJoin DataBiosphere
. Once you've been granted access to DataBiosphere, you should have write access to our repositories via membership in the DataBiosphere/broadwrite team. This level of permission should be sufficient for most contributions from across DSP.- If needed, repository admin access is conferred via membership in the DataBiosphere/data-custodian-journeys team, among others.
- Google Groups: Ask a team member for access to Google Groups including
jade-internal
anddsde-engineering
.
Make sure 2-factor authentication (2FA) is activated on your Broad and GitHub account before starting this process!
Connect your GitHub account to your Broad profile:
- Go to Broad people and select the My Profile tab.
- Link your profile to GitHub by clicking under Other Profiles.
- Check if the account is successfully linked.
- Open each of the following GitHub groups and Request to join by going to the Members tab: Broad Institute Read, Prometheus, DSDE Engineering
- To avoid being overwhelmed with notifications, add your Broad email address, route the notifications to that email, and unfollow projects that are not relevant to your team.
Connect your Docker Hub account to your Broad profile by contacting the DevOps team.
The Data Repo and Terra use Sam to abstract identity and access management. To gain access to these services, first create a non-Broad email address through Gmail. This email address will specifically be used for development purposes in our non-prod environments.
BITS requires that these development accounts have multi-factor authentication (MFA) enabled. Follow Google's instructions for enabling two-step authentication. When complete, document your development account with a screenshot showing that it has MFA enabled here.
Next, to register as a new user, click the Sign in with Google
button in each
of the environments with the newly created email address and follow the prompts:
For production, you will need to register using a
firecloud.org
email. In order to get an account, you must become suitable,
which requires following these steps.
Ask a member of the team to add you to the admins group for each of these environments.
Homebrew is a package manager which enables the installation of software using a single, convenient command line interface. To automatically install development tools necessary for the team, a Brewfile is used:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
curl -LO https://raw.githubusercontent.com/DataBiosphere/jade-data-repo/develop/docs/Brewfile
brew bundle --no-lock install
The Brewfile automatically installs the following tools:
- Git is a version control tool for tracking changes in projects and code.
- jq is a command line JSON processing tool.
- Docker is a tool to deliver software in packages called containers. Docker for MacOS also includes Kubernetes, which deploys groups of containers together in clusters.
- Helm streamlines the process of defining, installing, and upgrading Kubernetes deployments, which are otherwise challenging to manage. Some manual configuration is required below.
- Helmfile streamlines deploying multiple helm charts.
- Vault is an encrypted database used to store many of the team's secrets such as keys and passwords.
- Google Cloud SDK is a command-line interface to Google Cloud services. Once it is installed, you'll need to allow auth access and configure Docker to connect to the appropriate Google Cloud endpoint when necessary, which is done with the configuration below.
- IntelliJ IDEA is an integrated development
environment (IDE) for Java. There are two versions available: Ultimate (paid)
and Community (open-source). We recommend the Ultimate Edition to Broad
employees for its database navigation capabilities (Please reach out to a team member
for the Broad server license address). Alternatively, the Community
Edition has all the features needed for development, and this version can be
installed by switching
intellij-idea
withintellij-idea-ce
in the Brewfile. - Skaffold is a command line tool that facilitates continuous development for Kubernetes applications. It is used to test local changes against personal environments.
Unfortunately, some manual configuration is also necessary:
# configure vault
export VAULT_ADDR=https://clotho.broadinstitute.org:8200
# configure helm
helm repo add datarepo-helm https://broadinstitute.github.io/datarepo-helm
helm plugin install https://github.com/thomastaylor312/helm-namespace
helm plugin install https://github.com/databus23/helm-diff
helm repo update
# launch docker desktop - this installs docker in /usr/local/bin
open -a docker
# configure google-cloud-sdk
# login with an account that has access to your project. This will save credentials locally.
gcloud auth login
gcloud auth application-default login
#If you are using multiple accounts, you can switch to the correct one using this command:
gcloud config set account <account email>
gcloud auth configure-docker
# setup kubectl plugin
gcloud components install gke-gcloud-auth-plugin
The GitHub token verifies team permissions. This token is necessary for the next step, Login to Vault. To create a token:
- Go to the GitHub Personal Access Token page and click Generate new token.
- Give the token a descriptive name, only give it the following two scopes and then click Generate token.
read:org
scope underadmin:org
workflow
(this will give you access to kick off github actions from the command line)
- Store this token in a file:
GH_VAULT_TOKEN=<<GITHUB TOKEN VALUE>>
echo $GH_VAULT_TOKEN > ~/.gh_token
Vault access tokens can be obtained using the GitHub token from earlier as follows:
vault login -method=github token=$(cat ~/.gh_token)
It may be useful to create a folder for Broad projects in your home directory.
Setup Github SSH
Download the team's projects:
git clone [email protected]:DataBiosphere/jade-data-repo.git
git clone [email protected]:DataBiosphere/jade-data-repo-ui.git
git clone [email protected]:DataBiosphere/jade-data-repo-cli.git
git clone [email protected]:broadinstitute/datarepo-helm.git
git clone [email protected]:broadinstitute/datarepo-helm-definitions.git
git clone [email protected]:broadinstitute/terra-helmfile.git
git clone [email protected]:broadinstitute/terraform-ap-deployments.git
git clone [email protected]:broadinstitute/terraform-jade.git
-
Log in to Google Cloud Platform. In the top-left corner, select the BROADINSTITUTE.ORG organization. Select broad-jade-dev from the list of projects.
-
From the left hand sidebar, select Kubernetes Engine -> Clusters under COMPUTE.
-
Click Connect on the dev-master cluster. (You can also navigate here via direct link.) This gives you a
kubectl
command to copy and paste into the terminal:
gcloud container clusters get-credentials dev-master --region us-central1 --project broad-jade-dev
Postgres is an advanced open-source database. Postgres.app is used to manage a local installation of Postgres. The latest release can be found on the GitHub releases page. For compatibility, make sure to select a version which supports all the older versions of Postgres including 9.6. After launching the application, create a new version 11 database as follows:
- Click the sidebar icon (bottom left-hand corner) and then click the plus sign
- Name the new server, making sure to select version 11, and then Initialize it
- Add
/Applications/Postgres.app/Contents/Versions/latest/bin
to your path (there are multiple ways to achieve this) - Switch to the
jade-data-repo
repository, and create the data repo database and user following the database readme:
psql -f db/create-data-repo-db
# verify that the `datarepo` and `stairway` databases exist
psql --list
You will need to have an Azure account created (see https://docs.google.com/spreadsheets/d/1Q6CldqVPrATkWCAXljKrwlLz8oFsCQwcfOz_io-gcrA) and granted access to the TDR application in Azure and added to the jadedev group.
The Azure user should look like @azure.dev.envs-terra.bio
Both are performed by a teammate in the Azure portal: https://portal.azure.com
You must have your own managed application in order to create a TDR azure billing profile. Create a "tdr-dev" managed application:
- Azure portal -> Marketplace -> "My Marketplace" -> "Private plans" -> There you should see the "tdr-dev" plan.
- Create a new tdr-dev plan with the following setup:
- Subscription: 8201558_TDR_testuser1 (if you don't have access, ask for help from team)
- Resource group: TDR
- Application Name:
- Hit "next"
- On the next screen, pay attention to the email you set in this field. It will be the email you must log in as in order to create a TDR billing profile. It should be a gmail account.
- Hit create!
There are several ways to go about this, but here is one way that works. You can set up a Z-shell
configuration to keep your system environment variables. If you don't already have one created, you
can create one by running touch ~/.zshrc
. Then, you can open the file in a text editor with open ~/.zshrc
.
Below you'll find a list of environment variables needed to run TDR and tests locally. When you run
./render-configs.sh
, it populates key and txt files with secrets from vault and environment-specific
values. On daily setup, you'll need to run the following two commands in order:
./render-configs.sh
source ~/.zshrc
An alternate is to run ./render-configs.sh -i
which will put the variables into your clipboard. You
can then paste these values into an intellij bootRun or test run profile.
While not exhaustive, here's a list that notes the important environment variables to set when running
jade-data-repo
locally. These variables override settings in jade-data-repo/application.properties.
You can convert any application.property to an environment variable by switching to upper case and
every "." to "_".
- Instances of
ZZ
are only needed if you have a personal development environment setup. It is no longer recommended to set this up. But, if used,ZZ
should be replaced by your initials or the environment (i.e.dev
).
export JADE_USER_EMAIL=<EMAIL_YOU_CREATED_FOR_DEVELOPMENT>
# Integration test setting
export IT_JADE_API_URL=http://localhost:8080
# This file will be populated when you run ./render-configs.sh
export GOOGLE_APPLICATION_CREDENTIALS=/tmp/jade-dev-account.json
export GOOGLE_SA_CERT=/tmp/jade-dev-account.pem
# Setting for credentials to test on Azure - these files are populated in the render-configs.sh script
# Defaults to dev; You can switch to integration by running `./render-config.sh -a integration`
export AZURE_SYNAPSE_WORKSPACENAME=$(cat /tmp/azure-synapse-workspacename.txt)
export AZURE_CREDENTIALS_HOMETENANTID=$(cat /tmp/jade-dev-tenant-id.key)
export AZURE_CREDENTIALS_APPLICATIONID=$(cat /tmp/jade-dev-client-id.key)
export AZURE_CREDENTIALS_SECRET=$(cat /tmp/jade-dev-azure.key)
export AZURE_SYNAPSE_SQLADMINUSER=$(cat /tmp/jade-dev-synapse-admin-user.key)
export AZURE_SYNAPSE_SQLADMINPASSWORD=$(cat /tmp/jade-dev-synapse-admin-password.key)
export AZURE_SYNAPSE_ENCRIPTIONKEY=$(cat /tmp/jade-dev-synapse-encryption-key.key)
export AZURE_SYNAPSE_INITIALIZE=false
# RBS
# Defaults to RBS tools; you can switch to dev by running `./render-configs.sh -r dev`
export RBS_POOLID=$(cat /tmp/rbs-pool-id.txt)
export RBS_INSTANCEURL=$(cat /tmp/rbs-instance-url.txt)
# Pact contract test settings
export PACT_BROKER_USERNAME=$(cat /tmp/pact-ro-username.key)
export PACT_BROKER_PASSWORD=$(cat /tmp/pact-ro-password.key)
# Setting for testing environment (Further explained in oncall playbook)
export GOOGLE_ALLOWREUSEEXISTINGBUCKETS=true
# If you're not on a **Broad-provided** computer, you may need to set the host to `localhost`
# instead of `http://local.broadinstitute.org`:
export HOST=localhost
- Start postgres
- Ensure docker is running
- You may need to re-auth with vault every so often. Run
vault login -method=github token=$(cat ~/.gh_token)
- Run
./render-configs.sh
to pull secrets from vault - Refresh your Z-shell configuration by running
source ~/.zshrc
- Build the code and run the unit tests:
./gradlew build # build jade-data-repo and run unit tests
./gradlew bootRun # build jade-data-repo with Spring Boot features
./gradlew check # linters and unit tests
We don't recommend running the entire connected test suite locally, as it takes over an hour to run. Instead, you can select a specific test to run either in Intellij or the command line. First, make sure you have run through the following steps:
- Start postgres
- Ensure docker is running
- You may need to re-auth with vault every so often. Run
vault login -method=github token=$(cat ~/.gh_token)
- Run
./render-configs.sh
to pull secrets from vault - Refresh your Z-shell configuration by running
source ~/.zshrc
- Note:
TERRA_COMMON_STAIRWAY_FORCECLEANSTART
needs to be set to false for connected tests to pass
** Run test in the Command Line **
- Run
./gradlew :testConnected --tests '*<test name>'
to run a specific connected test
** Run or Debug test in Intellij **
- If you just refreshed your Z-shell configuration, you may need to restart intellij to get the
environment variables to populate the Intellij run configurations. Alternatively, you can run
./render-configs.sh -i
which will put all the environment variables into your clipboard and then you can paste them into the Intellij test setup. - Select test in intellij UI, select 'testConnected' and run or debug it
We don't recommend running the entire integrated test suite locally, as it takes over two hours to run. Instead, you can select a specific test to run either in Intellij or the command line. First, make sure you have run through the following steps:
- Start postgres
- Ensure docker is running
- You may need to re-auth with vault every so often. Run
vault login -method=github token=$(cat ~/.gh_token)
- Run
./render-configs.sh -a integration
to pull secrets from vault. For Azure Integration tests, we must point to the integration environment. - Make sure you have this environment variable set in the context of the test run:
export IT_JADE_API_URL=http://localhost:8080
- Refresh your Z-shell configuration by running
source ~/.zshrc
** Run test in the Command Line **
- Start the app locally with
./gradlew bootRun
- Open a new command line window, while bootRun runs in the background
- Run
./gradlew :testIntegration --tests '*<test name>'
to run a specific integration test (e.g./gradlew :testIntegration --tests '*testSnapshotBuilder'
)
** Run or Debug test in Intellij **
- If you just refreshed your Z-shell configuration, you may need to restart intellij to get the
environment variables to populate the Intellij run configurations. Alternatively, you can run
./render-configs.sh -i -a integration
which will put all the environment variables into your clipboard and then you can paste them into the Intellij test setup. - Start application by running
./gradlew bootRun
- Select test in intellij UI, select 'testIntegration' and run or debug it
This can be achieved by rendering a small set of Pact-specific configurations first:
./src/test/render-pact-configs.sh
# Reload your environment variables, e.g. src ~/.zshrc
./gradlew verifyPacts # verify contracts published with TDR as the provider
Note that connected and integration test suites can each take 90+ minutes to run. In normal development, you'll likely rely on GitHub Actions / automated PR test runs to run all tests, initially running locally those tests which pertain to your work.
To run a subset of tests, you can specify --tests <pattern>
when running
the above test commands. More specific examples are available in
Gradle documentation.
Follow the setup instructions to build the jade-data-repo-ui
repository.
By setting the PROXY_URL
environment variable, you can point the UI to your local data repo instance.
export PROXY_URL=http://localhost:8080
You need to have data repo running with ./gradlew bootRun
and the UI running with npm start
.
After running bootRun, you may want to create some datasets locally for use in testing. To do this, you can point the python setup script to your locally running data repo instance by setting the --host flag to http://localhost:8080. See the README for more information.
You can also run some of the notebooks from the Jade Client examples,
such as AzureY1Demo.ipynb
You can follow these instructions to get a BEE setup to work with TDR.
Additionally, you can point the python setup script to your BEE by setting the --host flag to the BEE url.
- Sam - set environment variable
SAM_BASEPATH
tohttps://local.broadinstitute.org:50443
Ensure that:
- You are on the Broad Non-split VPN. See earlier instructions. (Note: This is not needed for most operations)
- Docker is running.
- Postgres database is started.
- Logged in with vault (see above instructions for more details:
vault login -method=github token=$(cat ~/.gh_token)
- Environment variables are set. See list of environment variables above.
- Set Java Version in Intellij: You may need to manually set the java version in Intellij for the jade-data-repo project.
- File -> Project Structure -> Project -> SDKs -> add SDK -> Download JDK -> Version: 17, Vendor - AdoptOpenJDK 17 ( I used Termurin)
- You can also make sure this is correctly set under Intellij IDEA -> Preferences -> Build, Execution, Deployment -> Gradle -> Gradle JVM
We're moving away from setting up personal dev environments for every developer. Instead, we are moving towards using BEEs (BEE url, TDR on BEEs) However, there are still some use cases for personal dev environments.
Throughout these instructions, replace all instances of ZZ
with your initials.
There is a video of us walking through these steps in our Jade Google Drive Folder.
-
Follow the instructions in our terraform-jade repository to add your initials to the terraform templates and generate the static resources needed to deploy your personal development environment. Apply the changes and create a pull request to merge your additions to
terraform-jade
. -
Create your datarepo helm definition:
- In
datarepo-helm-definitions/dev
directory, copy an existing developer definition and change all initials to your own. Double-check with the team if you're not sure what to use, but the most recently added is probably the best choice. - By default, leave release chart versions unspecified in your
helmfile.yaml
so that latest versions are automatically picked up when running helmfile commands. Otherwise, verify that specified versions match the latest dependency versions. - Create a pull request with these changes in datarepo-helm-definitions.
-
Log in to Google Cloud Platform. In the top-left corner, select the BROADINSTITUTE.ORG organization. Select broad-jade-dev from the list of projects.
-
From the left hand sidebar, select Kubernetes Engine -> Clusters under COMPUTE.
-
Click Connect on the dev-master cluster. (You can also navigate here via direct link.) This gives you a
kubectl
command to copy and paste into the terminal:
gcloud container clusters get-credentials dev-master --region us-central1 --project broad-jade-dev
- Starting from your project directory in
datarepo-helm-definitions
, bring up Helm services (note it will take up to 10-15 minutes for ingress and cert creation):
Note: Make sure you are on the VPN, otherwise the helmfile apply will fail.
cd datarepo-helm-definitions/dev/ZZ
helmfile apply
# check that the deployments were created
helm list --namespace ZZ
- Update the following authorized domains within the Jade Data Repository OAuth2 Client configuration:
- Under Authorized JavaScript origins, add
https://jade-ZZ.datarepo-dev.broadinstitute.org
- Under Authorized redirect URIs, add
https://jade-ZZ.datarepo-dev.broadinstitute.org/login/google
andhttps://jade-ZZ.datarepo-dev.broadinstitute.org/webjars/springfox-swagger-ui/oauth2-redirect.html
- Connect to your new dev postgres database instance: Note that this is a different instance than the local one you will configure in step 10. The following command connects to the database via a proxy.
cd jade-data-repo/ops
DB=datarepo-ZZ SUFFIX=ZZ ENVIRONMENT=dev ./db-connect.sh
- Now that you're connected to your dev database, run the following command (Once DR-1156 is done, this will no longer be needed):
create extension pgcrypto;
- Create a pull request to
terraform-ap-deployments
to addhttps://jade-ZZ.datarepo-dev.broadinstitute.org
under the 'personal deployments' section ofdev.tfvars/b2c_tdr_hosts
. This allows B2C as a means of authentication, which is the default across environments.