Skip to content

Latest commit

 

History

History
553 lines (432 loc) · 26.8 KB

jade-getting-started.md

File metadata and controls

553 lines (432 loc) · 26.8 KB

Getting Started

These instructions assume you use MacOS, and that you are on the internal Broad network or the VPN. If the VPN is not installed, follow the instructions at this link.

During this process, you will need your GitHub and Docker Hub username, password, and personal access token for multiple steps, so make sure to have those handy. If you don't have those yet, see the section below, otherwise you can skip to Request Required Access.

1. Create a GitHub and Docker Hub account

GitHub is where the Broad stores our code and projects. Docker Hub allows the development team to easily deploy software without having to install lots of dependencies.

Sign up to these services with your personal email:

Create a personal access token so you can interact with GitHub on the command line.

2. Request Required Access

Ensure that you have access to the required team resources. If you encounter a permission error, it is likely because you are missing appropriate access.

  • DataBiosphere: Join the #github Slack channel, click the lightning bolt in the channel header, and select Join DataBiosphere. Once you've been granted access to DataBiosphere, you should have write access to our repositories via membership in the DataBiosphere/broadwrite team. This level of permission should be sufficient for most contributions from across DSP.
  • Google Groups: Ask a team member for access to Google Groups including jade-internal and dsde-engineering.

3. Connect accounts

Make sure 2-factor authentication (2FA) is activated on your Broad and GitHub account before starting this process!

Connect your GitHub account to your Broad profile:

  1. Go to Broad people and select the My Profile tab.
  2. Link your profile to GitHub by clicking under Other Profiles.
  3. Check if the account is successfully linked.
  4. Open each of the following GitHub groups and Request to join by going to the Members tab: Broad Institute Read, Prometheus, DSDE Engineering
  5. To avoid being overwhelmed with notifications, add your Broad email address, route the notifications to that email, and unfollow projects that are not relevant to your team.

Connect your Docker Hub account to your Broad profile by contacting the DevOps team.

4. Create Terra Accounts

The Data Repo and Terra use Sam to abstract identity and access management. To gain access to these services, first create a non-Broad email address through Gmail. This email address will specifically be used for development purposes in our non-prod environments.

BITS requires that these development accounts have multi-factor authentication (MFA) enabled. Follow Google's instructions for enabling two-step authentication. When complete, document your development account with a screenshot showing that it has MFA enabled here.

Next, to register as a new user, click the Sign in with Google button in each of the environments with the newly created email address and follow the prompts:

For production, you will need to register using a firecloud.org email. In order to get an account, you must become suitable, which requires following these steps.

Ask a member of the team to add you to the admins group for each of these environments.

5. Install Homebrew

Homebrew is a package manager which enables the installation of software using a single, convenient command line interface. To automatically install development tools necessary for the team, a Brewfile is used:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
curl -LO https://raw.githubusercontent.com/DataBiosphere/jade-data-repo/develop/docs/Brewfile
brew bundle --no-lock install

The Brewfile automatically installs the following tools:

  1. Git is a version control tool for tracking changes in projects and code.
  2. jq is a command line JSON processing tool.
  3. Docker is a tool to deliver software in packages called containers. Docker for MacOS also includes Kubernetes, which deploys groups of containers together in clusters.
  4. Helm streamlines the process of defining, installing, and upgrading Kubernetes deployments, which are otherwise challenging to manage. Some manual configuration is required below.
  5. Helmfile streamlines deploying multiple helm charts.
  6. Vault is an encrypted database used to store many of the team's secrets such as keys and passwords.
  7. Google Cloud SDK is a command-line interface to Google Cloud services. Once it is installed, you'll need to allow auth access and configure Docker to connect to the appropriate Google Cloud endpoint when necessary, which is done with the configuration below.
  8. IntelliJ IDEA is an integrated development environment (IDE) for Java. There are two versions available: Ultimate (paid) and Community (open-source). We recommend the Ultimate Edition to Broad employees for its database navigation capabilities (Please reach out to a team member for the Broad server license address). Alternatively, the Community Edition has all the features needed for development, and this version can be installed by switching intellij-idea with intellij-idea-ce in the Brewfile.
  9. Skaffold is a command line tool that facilitates continuous development for Kubernetes applications. It is used to test local changes against personal environments.

Unfortunately, some manual configuration is also necessary:

# configure vault
export VAULT_ADDR=https://clotho.broadinstitute.org:8200

# configure helm
helm repo add datarepo-helm https://broadinstitute.github.io/datarepo-helm
helm plugin install https://github.com/thomastaylor312/helm-namespace
helm plugin install https://github.com/databus23/helm-diff
helm repo update

# launch docker desktop - this installs docker in /usr/local/bin
open -a docker

# configure google-cloud-sdk
# login with an account that has access to your project. This will save credentials locally.
gcloud auth login
gcloud auth application-default login

#If you are using multiple accounts, you can switch to the correct one using this command:
gcloud config set account <account email>

gcloud auth configure-docker

# setup kubectl plugin
gcloud components install gke-gcloud-auth-plugin

6. Create GitHub token

The GitHub token verifies team permissions. This token is necessary for the next step, Login to Vault. To create a token:

  1. Go to the GitHub Personal Access Token page and click Generate new token.
  2. Give the token a descriptive name, only give it the following two scopes and then click Generate token.
  • read:org scope under admin:org
  • workflow (this will give you access to kick off github actions from the command line)
  1. Store this token in a file:
GH_VAULT_TOKEN=<<GITHUB TOKEN VALUE>>
echo $GH_VAULT_TOKEN > ~/.gh_token

7. Login to Vault

Vault access tokens can be obtained using the GitHub token from earlier as follows:

vault login -method=github token=$(cat ~/.gh_token)

8. Code Checkout

It may be useful to create a folder for Broad projects in your home directory.

Setup Github SSH

Download the team's projects:

git clone [email protected]:DataBiosphere/jade-data-repo.git
git clone [email protected]:DataBiosphere/jade-data-repo-ui.git
git clone [email protected]:DataBiosphere/jade-data-repo-cli.git
git clone [email protected]:broadinstitute/datarepo-helm.git
git clone [email protected]:broadinstitute/datarepo-helm-definitions.git
git clone [email protected]:broadinstitute/terra-helmfile.git
git clone [email protected]:broadinstitute/terraform-ap-deployments.git
git clone [email protected]:broadinstitute/terraform-jade.git

9. Google Cloud Platform setup

  1. Log in to Google Cloud Platform. In the top-left corner, select the BROADINSTITUTE.ORG organization. Select broad-jade-dev from the list of projects.

  2. From the left hand sidebar, select Kubernetes Engine -> Clusters under COMPUTE.

  3. Click Connect on the dev-master cluster. (You can also navigate here via direct link.) This gives you a kubectl command to copy and paste into the terminal:

gcloud container clusters get-credentials dev-master --region us-central1 --project broad-jade-dev

10. Install Postgres 11

Postgres is an advanced open-source database. Postgres.app is used to manage a local installation of Postgres. The latest release can be found on the GitHub releases page. For compatibility, make sure to select a version which supports all the older versions of Postgres including 9.6. After launching the application, create a new version 11 database as follows:

  1. Click the sidebar icon (bottom left-hand corner) and then click the plus sign
  2. Name the new server, making sure to select version 11, and then Initialize it
  3. Add /Applications/Postgres.app/Contents/Versions/latest/bin to your path (there are multiple ways to achieve this)
  4. Switch to the jade-data-repo repository, and create the data repo database and user following the database readme:
psql -f db/create-data-repo-db
# verify that the `datarepo` and `stairway` databases exist
psql --list

11. Configure Azure

1. Get Azure Account

You will need to have an Azure account created (see https://docs.google.com/spreadsheets/d/1Q6CldqVPrATkWCAXljKrwlLz8oFsCQwcfOz_io-gcrA) and granted access to the TDR application in Azure and added to the jadedev group.

The Azure user should look like @azure.dev.envs-terra.bio

Both are performed by a teammate in the Azure portal: https://portal.azure.com

2. Create your own managed application in Azure

You must have your own managed application in order to create a TDR azure billing profile. Create a "tdr-dev" managed application:

  • Azure portal -> Marketplace -> "My Marketplace" -> "Private plans" -> There you should see the "tdr-dev" plan.
  • Create a new tdr-dev plan with the following setup:
    • Subscription: 8201558_TDR_testuser1 (if you don't have access, ask for help from team)
    • Resource group: TDR
    • Application Name:
    • Hit "next"
  • On the next screen, pay attention to the email you set in this field. It will be the email you must log in as in order to create a TDR billing profile. It should be a gmail account.
  • Hit create!

12. Setup Environment Variable

There are several ways to go about this, but here is one way that works. You can set up a Z-shell configuration to keep your system environment variables. If you don't already have one created, you can create one by running touch ~/.zshrc. Then, you can open the file in a text editor with open ~/.zshrc. Below you'll find a list of environment variables needed to run TDR and tests locally. When you run ./render-configs.sh, it populates key and txt files with secrets from vault and environment-specific values. On daily setup, you'll need to run the following two commands in order:

./render-configs.sh
source ~/.zshrc

An alternate is to run ./render-configs.sh -i which will put the variables into your clipboard. You can then paste these values into an intellij bootRun or test run profile.

Environment Variables

While not exhaustive, here's a list that notes the important environment variables to set when running jade-data-repo locally. These variables override settings in jade-data-repo/application.properties. You can convert any application.property to an environment variable by switching to upper case and every "." to "_".

  • Instances of ZZ are only needed if you have a personal development environment setup. It is no longer recommended to set this up. But, if used, ZZ should be replaced by your initials or the environment (i.e. dev).
export JADE_USER_EMAIL=<EMAIL_YOU_CREATED_FOR_DEVELOPMENT>

# Integration test setting
export IT_JADE_API_URL=http://localhost:8080

# This file will be populated when you run ./render-configs.sh
export GOOGLE_APPLICATION_CREDENTIALS=/tmp/jade-dev-account.json
export GOOGLE_SA_CERT=/tmp/jade-dev-account.pem

# Setting for credentials to test on Azure - these files are populated in the render-configs.sh script
# Defaults to dev; You can switch to integration by running `./render-config.sh -a integration`
export AZURE_SYNAPSE_WORKSPACENAME=$(cat /tmp/azure-synapse-workspacename.txt)
export AZURE_CREDENTIALS_HOMETENANTID=$(cat /tmp/jade-dev-tenant-id.key)
export AZURE_CREDENTIALS_APPLICATIONID=$(cat /tmp/jade-dev-client-id.key)
export AZURE_CREDENTIALS_SECRET=$(cat /tmp/jade-dev-azure.key)
export AZURE_SYNAPSE_SQLADMINUSER=$(cat /tmp/jade-dev-synapse-admin-user.key)
export AZURE_SYNAPSE_SQLADMINPASSWORD=$(cat /tmp/jade-dev-synapse-admin-password.key)
export AZURE_SYNAPSE_ENCRIPTIONKEY=$(cat /tmp/jade-dev-synapse-encryption-key.key)
export AZURE_SYNAPSE_INITIALIZE=false

# RBS
# Defaults to RBS tools; you can switch to dev by running `./render-configs.sh -r dev`
export RBS_POOLID=$(cat /tmp/rbs-pool-id.txt)
export RBS_INSTANCEURL=$(cat /tmp/rbs-instance-url.txt)

# Pact contract test settings
export PACT_BROKER_USERNAME=$(cat /tmp/pact-ro-username.key)
export PACT_BROKER_PASSWORD=$(cat /tmp/pact-ro-password.key)

# Setting for testing environment (Further explained in oncall playbook)
export GOOGLE_ALLOWREUSEEXISTINGBUCKETS=true

# If you're not on a **Broad-provided** computer, you may need to set the host to `localhost`
# instead of `http://local.broadinstitute.org`:
export HOST=localhost

13. Repository Setup

1. Build, run and Unit Test jade-data-repo

  • Start postgres
  • Ensure docker is running
  • You may need to re-auth with vault every so often. Run vault login -method=github token=$(cat ~/.gh_token)
  • Run ./render-configs.sh to pull secrets from vault
  • Refresh your Z-shell configuration by running source ~/.zshrc
  • Build the code and run the unit tests:
./gradlew build           # build jade-data-repo and run unit tests
./gradlew bootRun         # build jade-data-repo with Spring Boot features
./gradlew check           # linters and unit tests

2. Run connected tests

We don't recommend running the entire connected test suite locally, as it takes over an hour to run. Instead, you can select a specific test to run either in Intellij or the command line. First, make sure you have run through the following steps:

  • Start postgres
  • Ensure docker is running
  • You may need to re-auth with vault every so often. Run vault login -method=github token=$(cat ~/.gh_token)
  • Run ./render-configs.sh to pull secrets from vault
  • Refresh your Z-shell configuration by running source ~/.zshrc
  • Note: TERRA_COMMON_STAIRWAY_FORCECLEANSTART needs to be set to false for connected tests to pass

** Run test in the Command Line **

  • Run ./gradlew :testConnected --tests '*<test name>' to run a specific connected test

** Run or Debug test in Intellij **

  • If you just refreshed your Z-shell configuration, you may need to restart intellij to get the environment variables to populate the Intellij run configurations. Alternatively, you can run ./render-configs.sh -i which will put all the environment variables into your clipboard and then you can paste them into the Intellij test setup.
  • Select test in intellij UI, select 'testConnected' and run or debug it

3. Run Integration tests

We don't recommend running the entire integrated test suite locally, as it takes over two hours to run. Instead, you can select a specific test to run either in Intellij or the command line. First, make sure you have run through the following steps:

  • Start postgres
  • Ensure docker is running
  • You may need to re-auth with vault every so often. Run vault login -method=github token=$(cat ~/.gh_token)
  • Run ./render-configs.sh -a integration to pull secrets from vault. For Azure Integration tests, we must point to the integration environment.
  • Make sure you have this environment variable set in the context of the test run: export IT_JADE_API_URL=http://localhost:8080
  • Refresh your Z-shell configuration by running source ~/.zshrc

** Run test in the Command Line **

  • Start the app locally with ./gradlew bootRun
  • Open a new command line window, while bootRun runs in the background
  • Run ./gradlew :testIntegration --tests '*<test name>' to run a specific integration test (e.g ./gradlew :testIntegration --tests '*testSnapshotBuilder')

** Run or Debug test in Intellij **

  • If you just refreshed your Z-shell configuration, you may need to restart intellij to get the environment variables to populate the Intellij run configurations. Alternatively, you can run ./render-configs.sh -i -a integration which will put all the environment variables into your clipboard and then you can paste them into the Intellij test setup.
  • Start application by running ./gradlew bootRun
  • Select test in intellij UI, select 'testIntegration' and run or debug it

4. Running Pact tests

This can be achieved by rendering a small set of Pact-specific configurations first:

./src/test/render-pact-configs.sh
# Reload your environment variables, e.g. src ~/.zshrc
./gradlew verifyPacts     # verify contracts published with TDR as the provider

Note that connected and integration test suites can each take 90+ minutes to run. In normal development, you'll likely rely on GitHub Actions / automated PR test runs to run all tests, initially running locally those tests which pertain to your work.

To run a subset of tests, you can specify --tests <pattern> when running the above test commands. More specific examples are available in Gradle documentation.

5. Build jade-data-repo-ui

Follow the setup instructions to build the jade-data-repo-ui repository.

By setting the PROXY_URL environment variable, you can point the UI to your local data repo instance.

export PROXY_URL=http://localhost:8080

You need to have data repo running with ./gradlew bootRun and the UI running with npm start.

13. Set up TDR resources

After running bootRun, you may want to create some datasets locally for use in testing. To do this, you can point the python setup script to your locally running data repo instance by setting the --host flag to http://localhost:8080. See the README for more information.

You can also run some of the notebooks from the Jade Client examples, such as AzureY1Demo.ipynb

14. Set up TDR on BEEs

You can follow these instructions to get a BEE setup to work with TDR.

Additionally, you can point the python setup script to your BEE by setting the --host flag to the BEE url.

15. Running locally with other locally running services

  1. Sam - set environment variable SAM_BASEPATH to https://local.broadinstitute.org:50443

Common Issues

Ensure that:

  1. You are on the Broad Non-split VPN. See earlier instructions. (Note: This is not needed for most operations)
  2. Docker is running.
  3. Postgres database is started.
  4. Logged in with vault (see above instructions for more details:
    vault login -method=github token=$(cat ~/.gh_token)
    
  5. Environment variables are set. See list of environment variables above.
  6. Set Java Version in Intellij: You may need to manually set the java version in Intellij for the jade-data-repo project.
  • File -> Project Structure -> Project -> SDKs -> add SDK -> Download JDK -> Version: 17, Vendor - AdoptOpenJDK 17 ( I used Termurin) image image
  • You can also make sure this is correctly set under Intellij IDEA -> Preferences -> Build, Execution, Deployment -> Gradle -> Gradle JVM image

Appendix

Personal Dev Environment Setup

We're moving away from setting up personal dev environments for every developer. Instead, we are moving towards using BEEs (BEE url, TDR on BEEs) However, there are still some use cases for personal dev environments.

Throughout these instructions, replace all instances of ZZ with your initials.

There is a video of us walking through these steps in our Jade Google Drive Folder.

  1. Follow the instructions in our terraform-jade repository to add your initials to the terraform templates and generate the static resources needed to deploy your personal development environment. Apply the changes and create a pull request to merge your additions to terraform-jade.

  2. Create your datarepo helm definition:

  • In datarepo-helm-definitions/dev directory, copy an existing developer definition and change all initials to your own. Double-check with the team if you're not sure what to use, but the most recently added is probably the best choice.
  • By default, leave release chart versions unspecified in your helmfile.yaml so that latest versions are automatically picked up when running helmfile commands. Otherwise, verify that specified versions match the latest dependency versions.
  • Create a pull request with these changes in datarepo-helm-definitions.
  1. Log in to Google Cloud Platform. In the top-left corner, select the BROADINSTITUTE.ORG organization. Select broad-jade-dev from the list of projects.

  2. From the left hand sidebar, select Kubernetes Engine -> Clusters under COMPUTE.

  3. Click Connect on the dev-master cluster. (You can also navigate here via direct link.) This gives you a kubectl command to copy and paste into the terminal:

gcloud container clusters get-credentials dev-master --region us-central1 --project broad-jade-dev
  1. Starting from your project directory in datarepo-helm-definitions, bring up Helm services (note it will take up to 10-15 minutes for ingress and cert creation):

Note: Make sure you are on the VPN, otherwise the helmfile apply will fail.

cd datarepo-helm-definitions/dev/ZZ
helmfile apply

# check that the deployments were created
helm list --namespace ZZ
  1. Update the following authorized domains within the Jade Data Repository OAuth2 Client configuration:
  • Under Authorized JavaScript origins, add https://jade-ZZ.datarepo-dev.broadinstitute.org
  • Under Authorized redirect URIs, add https://jade-ZZ.datarepo-dev.broadinstitute.org/login/google and https://jade-ZZ.datarepo-dev.broadinstitute.org/webjars/springfox-swagger-ui/oauth2-redirect.html
  1. Connect to your new dev postgres database instance: Note that this is a different instance than the local one you will configure in step 10. The following command connects to the database via a proxy.
cd jade-data-repo/ops
DB=datarepo-ZZ SUFFIX=ZZ ENVIRONMENT=dev ./db-connect.sh
  1. Now that you're connected to your dev database, run the following command (Once DR-1156 is done, this will no longer be needed):
create extension pgcrypto;
  1. Create a pull request to terraform-ap-deployments to add https://jade-ZZ.datarepo-dev.broadinstitute.org under the 'personal deployments' section of dev.tfvars/b2c_tdr_hosts. This allows B2C as a means of authentication, which is the default across environments.