These instructions assume you use MacOS, and that you are on the internal Broad network or the VPN. If the VPN is not installed, follow the instructions at this link.
During this process, you will need your GitHub and Docker Hub username, password, and personal access token for multiple steps, so make sure to have those handy. If you don't have those yet, see the section below, otherwise you can skip to Request Required Access.
GitHub is where the Broad stores our code and projects. Docker Hub allows the development team to easily deploy software without having to install lots of dependencies.
Sign up to these services with your personal email:
Create a personal access token so you can interact with GitHub on the command line.
Ensure that you have access to the required team resources. If you encounter a permission error, it is likely because you are missing appropriate access.
- DataBiosphere: Join the
#github
Slack channel, click the lightning bolt in the channel header, and selectJoin DataBiosphere
. Once you've been granted access to DataBiosphere, you should have write access to our repositories via membership in the DataBiosphere/broadwrite team. This level of permission should be sufficient for most contributions from across DSP.- If needed, repository admin access is conferred via membership in the DataBiosphere/data-custodian-journeys team, among others.
- Google Groups: Ask a team member for access to Google Groups including
jade-internal
anddsde-engineering
.
Make sure 2-factor authentication (2FA) is activated on your Broad and GitHub account before starting this process!
Connect your GitHub account to your Broad profile:
- Go to Broad people and select the My Profile tab.
- Link your profile to GitHub by clicking under Other Profiles.
- Check if the account is successfully linked.
- Open each of the following GitHub groups and Request to join by going to the Members tab: Broad Institute Read, Prometheus, DSDE Engineering
- To avoid being overwhelmed with notifications, add your Broad email address, route the notifications to that email, and unfollow projects that are not relevant to your team.
Connect your Docker Hub account to your Broad profile by contacting the DevOps team.
The Data Repo and Terra use Sam to abstract identity and access management. To gain access to these services, first create a non-Broad email address through Gmail. This email address will specifically be used for development purposes in our non-prod environments.
BITS requires that these development accounts have multi-factor authentication (MFA) enabled. Follow Google's instructions for enabling two-step authentication.
Next, to register as a new user, click the Sign in with Google
button in each
of the environments with the newly created email address and follow the prompts:
For production, you will need to register using a
firecloud.org
email. In order to get an account, you must become suitable,
which requires following these steps.
Ask a member of the team to add you to the admins group for each of these environments.
Homebrew is a package manager which enables the installation of software using a single, convenient command line interface. To automatically install development tools necessary for the team, a Brewfile is used:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
curl -LO https://raw.githubusercontent.com/DataBiosphere/jade-data-repo/develop/docs/Brewfile
brew bundle --no-lock install
The Brewfile automatically installs the following tools:
- Git is a version control tool for tracking changes in projects and code.
- jq is a command line JSON processing tool.
- Helm streamlines the process of defining, installing, and upgrading Kubernetes deployments, which are otherwise challenging to manage. Some manual configuration is required below.
- Helmfile streamlines deploying multiple helm charts.
- Google Cloud SDK is a command-line interface to Google Cloud services. Once it is installed, you'll need to allow auth access and configure Docker to connect to the appropriate Google Cloud endpoint when necessary, which is done with the configuration below.
- IntelliJ IDEA is an integrated development
environment (IDE) for Java. There are two versions available: Ultimate (paid)
and Community (open-source). We recommend the Ultimate Edition to Broad
employees for its database navigation capabilities (Please reach out to a team member
for the Broad server license address). Alternatively, the Community
Edition has all the features needed for development, and this version can be
installed by switching
intellij-idea
withintellij-idea-ce
in the Brewfile. - Skaffold is a command line tool that facilitates continuous development for Kubernetes applications. It is used to test local changes against personal environments.
Unfortunately, some manual configuration is also necessary:
# configure helm
helm repo add datarepo-helm https://broadinstitute.github.io/datarepo-helm
helm plugin install https://github.com/thomastaylor312/helm-namespace
helm plugin install https://github.com/databus23/helm-diff
helm repo update
# launch docker desktop - this installs docker in /usr/local/bin
open -a docker
# configure google-cloud-sdk
# login with an account that has access to your project. This will save credentials locally.
gcloud auth login
gcloud auth application-default login
#If you are using multiple accounts, you can switch to the correct one using this command:
gcloud config set account <account email>
gcloud auth configure-docker
# setup kubectl plugin
gcloud components install gke-gcloud-auth-plugin
It may be useful to create a folder for Broad projects in your home directory.
Setup Github SSH
Download the team's projects:
git clone [email protected]:DataBiosphere/jade-data-repo.git
git clone [email protected]:DataBiosphere/jade-data-repo-ui.git
git clone [email protected]:DataBiosphere/jade-data-repo-cli.git
git clone [email protected]:broadinstitute/datarepo-helm.git
git clone [email protected]:broadinstitute/datarepo-helm-definitions.git
git clone [email protected]:broadinstitute/terra-helmfile.git
git clone [email protected]:broadinstitute/terraform-ap-deployments.git
-
Log in to Google Cloud Platform. In the top-left corner, select the BROADINSTITUTE.ORG organization. Select broad-jade-dev from the list of projects.
-
From the left hand sidebar, select Kubernetes Engine -> Clusters under COMPUTE.
-
Click Connect on the dev-master cluster. (You can also navigate here via direct link.) This gives you a
kubectl
command to copy and paste into the terminal:
gcloud container clusters get-credentials dev-master --region us-central1 --project broad-jade-dev
In order to do Azure development, do the following steps:
You will need to have an Azure account created (see https://docs.google.com/spreadsheets/d/1Q6CldqVPrATkWCAXljKrwlLz8oFsCQwcfOz_io-gcrA) and be granted access to the TDR application in Azure and added to the jadedev group.
The Azure user should look like @azure.dev.envs-terra.bio
Both are performed by a teammate in the Azure portal: https://portal.azure.com
You must have your own managed application in order to create a TDR azure billing profile. Create a "tdr-dev" managed application:
- Azure portal -> Marketplace -> "My Marketplace" -> "Private plans" -> There you should see the "tdr-dev" plan.
- Create a new tdr-dev plan with the following setup:
- Subscription: 8201558_TDR_testuser1 (if you don't have access, ask for help from team)
- Resource group: TDR
- Application Name:
- Hit "next"
- On the next screen, pay attention to the email you set in this field. It will be the email you must log in as in order to create a TDR billing profile. It should be a gmail account.
- Hit create!
There are several ways to go about this, but here is one way that works. You can set up a Z-shell
configuration to keep your system environment variables. If you don't already have one created, you
can create one by running touch ~/.zshrc
. Then, you can open the file in a text editor with open ~/.zshrc
. When you run
./scripts/render-configs.sh
, it populates key and txt files with secrets from Google Cloud Secrets and environment-specific
values. If you are using the setup script, ./scripts/render-configs.sh
should automatically run.
An alternate is to run ./scripts/render-configs.sh -i
which will put the variables into your clipboard. You
can then paste these values into an intellij bootRun or test run profile.
While not exhaustive, here's a list that notes the important environment variables to set when running
jade-data-repo
locally that are not set by ./scripts/render-configs.sh
. These variables override settings in jade-data-repo/application.properties.
You can convert any application.property to an environment variable by switching to upper case and
every "." to "_".
- Instances of
ZZ
are only needed if you have a personal development environment setup. It is no longer recommended to set this up. But, if used,ZZ
should be replaced by your initials or the environment (i.e.dev
).
export JADE_USER_EMAIL=<EMAIL_YOU_CREATED_FOR_DEVELOPMENT>
export AZURE_SYNAPSE_INITIALIZE=false
# Pact contract test settings
export PACT_BROKER_USERNAME=$(cat /tmp/pact-ro-username.key)
export PACT_BROKER_PASSWORD=$(cat /tmp/pact-ro-password.key)
# Setting for testing environment (Further explained in oncall playbook)
export GOOGLE_ALLOWREUSEEXISTINGBUCKETS=true
# If you're not on a **Broad-provided** computer, you may need to set the host to `localhost`
# instead of `http://local.broadinstitute.org`:
export HOST=localhost
- Ensure docker is running
- Auth as your broadinstitute.org to pull from Google Secrets Manager
gcloud auth login <you>@broadinstitute.org
- Run
./scripts/run-db start
to start the DB in a docker container - Run
./scripts/run start_local
to run TDR locally or./scripts/run start_docker
to run TDR in a docker container - To Build the code and run the unit tests:
./scripts/build project # build jade-data-repo and run unit tests
./scripts/run tests # linters and unit tests
We don't recommend running the entire connected test suite locally, as it takes over an hour to run. Instead, you can select a specific test to run either in Intellij or the command line. First, make sure you have run through the following steps:
- Ensure docker is running
- Auth as your broadinstitute.org to pull from Google Secrets Manager
gcloud auth login <you>@broadinstitute.org
- Run
./scripts/run-db start
to start the DB in a docker container
Run test in the Command Line
- Run
GRADLE_ARGS='--tests *<specific test name>' ./scripts/run connected
to run a specific connected test
Run or Debug test in Intellij
- Run
./scripts/render-configs.sh -i
which will put all the environment variables into your clipboard and then you can paste them into the Intellij test setup. - Select test in intellij UI, select 'testConnected' and run or debug it
We don't recommend running the entire integrated test suite locally, as it takes an hour to run. Instead, you can select a specific test to run either in Intellij or the command line. First, make sure you have run through the following steps:
- Ensure docker is running
- Auth as your broadinstitute.org to pull from Google Secrets Manager
gcloud auth login <you>@broadinstitute.org
- Run
./scripts/run-db start
to start the DB in a docker container
Run test in the Command Line
- Run
GRADLE_ARGS='--tests *<specific test name>' ./scripts/run integration
to run a specific integration test
Run or Debug test in Intellij
- Run
./scripts/render-configs.sh -i -a integration
which will put all the environment variables into your clipboard and then you can paste them into the Intellij test setup. - Start application by running
./scripts/run local
(or in docker with./scripts/run docker
) - Select test in intellij UI, select 'testIntegration' and run or debug it
This can be achieved by rendering a small set of Pact-specific configurations first:
./src/test/render-pact-configs.sh
# Reload your environment variables, e.g. src ~/.zshrc
./gradlew verifyPacts # verify contracts published with TDR as the provider
Note that connected and integration test suites can each take 90+ minutes to run. In normal development, you'll likely rely on GitHub Actions / automated PR test runs to run all tests, initially running locally those tests which pertain to your work.
To run a subset of tests, you can specify --tests <pattern>
when running
the above test commands. More specific examples are available in
Gradle documentation.
To do API and UI development simultaneously, Follow the setup instructions to build the jade-data-repo-ui
repository.
By setting the PROXY_URL
environment variable, you can point the UI to your local data repo instance.
export PROXY_URL=http://localhost:8080
You need to have data repo running with ./gradlew bootRun
and the UI running with npm start
.
Testing in a BEE (Branch Engineering Environment)
- You can test your changes in a BEE by following the instructions here
- You can point the python setup script to your BEE by setting the --host flag to the BEE url.
Testing Helm Chart Changes (holdover until datarepo-helm moves to terra-helmfile)
- Helm chart changes in datarepo-helm can be tested by spinning up a personal dev environment. See instructions in datarepo-helm-definitions for more information.
After running bootRun, you may want to create some datasets locally for use in testing. To do this, you can point the python setup script to your locally running data repo instance by setting the --host flag to http://localhost:8080. See the README for more information.
- Sam - set environment variable
SAM_BASEPATH
tohttps://local.broadinstitute.org:50443
Ensure that:
- You are on the Broad Non-split VPN. See earlier instructions. (Note: This is not needed for most operations)
- Docker is running.
- Postgres database is started.
- Authed as your broadinstitute.org account
- Environment variables are set. See list of environment variables above.
- Ensure
./scripts/render-configs.sh
has been run and sourced to the command line - Set Java Version in Intellij: You may need to manually set the java version in Intellij for the jade-data-repo project.
- File -> Project Structure -> Project -> SDKs -> add SDK -> Download JDK -> Version: 17, Vendor - AdoptOpenJDK 17 ( I used Termurin)
- You can also make sure this is correctly set under Intellij IDEA -> Preferences -> Build, Execution, Deployment -> Gradle -> Gradle JVM
TERRA_COMMON_STAIRWAY_FORCECLEANSTART
needs to be set to false for tests to pass
- Stairway Flight Developer Guide - Data Repo utilizes Stairway to run asynchronous operations throughout the code base.
- Data Repo Service - The Data Repo implements parts of the The Data Repository Service (DRS) specification.
- Data Exploration Team Common Problems doc - A document written by the Data Exploration team to help solve some common issues with TDR and Terra UI