-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #180 from broadinstitute/FE-287-implement-copy-fro…
…m-tdr-to-gcs FE-287 implement copy_from_tdr_to_gcs_hca
- Loading branch information
Showing
14 changed files
with
442 additions
and
93 deletions.
There are no files selected for viewing
40 changes: 40 additions & 0 deletions
40
.github/workflows/build_and_push_docker_copy_from_tdr_to_gcs_hca_dev.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
name: Build and Publish Dev Images for scripts/tdr/copy_from_tdr_to_gcs_hca | ||
on: | ||
push: | ||
branches-ignore: [main] | ||
paths: | ||
- scripts/copy_from_tdr_to_gcs_hca/** | ||
- .github/workflows/build_and_push_docker_copy_from_tdr_to_gcs_hca_dev.yaml | ||
env: | ||
GCP_PROJECT_ID: dsp-fieldeng-dev | ||
GCP_REPOSITORY: horsefish | ||
GITHUB_SHA: ${{ github.sha }} | ||
|
||
jobs: | ||
build-and-push-dev-images: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v2 | ||
|
||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v1 | ||
|
||
- name: Login to GCP | ||
uses: google-github-actions/auth@v1 | ||
with: | ||
credentials_json: ${{ secrets.BASE64_SAKEY_DSPFIELDENG_GARPUSHER }} | ||
|
||
- name: Configure Docker to use the Google Artifact Registry | ||
run: gcloud auth configure-docker us-east4-docker.pkg.dev | ||
|
||
- name: Build and Push copy_from_tdr_to_gcs_hca Docker Image | ||
run: | | ||
docker build -t us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA -f scripts/tdr/copy_from_tdr_to_gcs_hca/Dockerfile scripts/tdr/copy_from_tdr_to_gcs_hca | ||
docker push us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA | ||
- name: Set image tag to 'dev' | ||
run: | | ||
docker tag us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:dev | ||
docker push us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:dev |
43 changes: 43 additions & 0 deletions
43
.github/workflows/build_and_push_docker_copy_from_tdr_to_gcs_hca_main.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
name: Build and Publish Latest Images for scripts/tdr/copy_from_tdr_to_gcs_hca | ||
on: | ||
pull_request_target: | ||
types: | ||
- closed | ||
branches: | ||
- main | ||
paths: | ||
- scripts/tdr/copy_from_tdr_to_gcs_hca/** | ||
- .github/workflows/build_and_push_docker_copy_from_tdr_to_gcs_hca_main.yaml | ||
env: | ||
GCP_PROJECT_ID: dsp-fieldeng-dev | ||
GCP_REPOSITORY: horsefish | ||
GITHUB_SHA: ${{ github.sha }} | ||
|
||
jobs: | ||
build-and-push-dev-images: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v2 | ||
|
||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v1 | ||
|
||
- name: Login to GCP | ||
uses: google-github-actions/auth@v1 | ||
with: | ||
credentials_json: ${{ secrets.BASE64_SAKEY_DSPFIELDENG_GARPUSHER }} | ||
|
||
- name: Configure Docker to use the Google Artifact Registry | ||
run: gcloud auth configure-docker us-east4-docker.pkg.dev | ||
|
||
- name: Build and Push copy_from_tdr_to_gcs_hca Docker Image | ||
run: | | ||
docker build -t us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA -f scripts/tdr/copy_from_tdr_to_gcs_hca/Dockerfile scripts/tdr/copy_from_tdr_to_gcs_hca | ||
docker push us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA | ||
- name: Set image tag to 'latest' | ||
run: | | ||
docker tag us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:latest | ||
docker push us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:latest |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Usage: python test.py file1 file2 | ||
# file1: file to loop through | ||
# file2: file to search for matching string | ||
# if no matching string found, print the string | ||
# if found, do nothing | ||
# output: print the string if no matching string found, print "found" if all strings are found | ||
|
||
import sys | ||
|
||
with open(sys.argv[1]) as f: | ||
lines = [line.strip() for line in f.readlines()] | ||
|
||
with open(sys.argv[2], 'r') as f2: | ||
data = f2.read() | ||
|
||
counter = 0 | ||
for line in lines: | ||
if line not in data: | ||
print(f'Not Found: {line}') | ||
counter += 1 | ||
else: | ||
print("counter") |
This file was deleted.
Oops, something went wrong.
69 changes: 0 additions & 69 deletions
69
scripts/tdr/copy_from_tdr_to_gcs/from_bash_copy_from_tdr.py
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
FROM us.gcr.io/broad-dsp-gcr-public/base/python:3.12-alpine | ||
|
||
ENV PATH /google-cloud-sdk/bin:$PATH | ||
RUN if [ `uname -m` = 'x86_64' ]; then echo -n "x86_64" > /tmp/arch; else echo -n "arm" > /tmp/arch; fi; | ||
RUN ARCH=`cat /tmp/arch` && apk --no-cache upgrade && apk --no-cache add \ | ||
bash \ | ||
curl \ | ||
python3 \ | ||
py3-crcmod \ | ||
py3-openssl \ | ||
bash \ | ||
libc6-compat \ | ||
openssh-client \ | ||
git \ | ||
gnupg \ | ||
&& curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-linux-x86_64.tar.gz && \ | ||
tar xzf google-cloud-cli-linux-x86_64.tar.gz && \ | ||
rm google-cloud-cli-linux-x86_64.tar.gz && \ | ||
gcloud config set core/disable_usage_reporting true && \ | ||
gcloud config set component_manager/disable_update_check true && \ | ||
gcloud config set metrics/environment docker_image_alpine && \ | ||
gcloud --version | ||
RUN git config --system credential.'https://source.developers.google.com'.helper gcloud.sh | ||
VOLUME ["/root/.config"] | ||
|
||
WORKDIR /scripts/tdr/copy_from_tdr_to_gcs_hca | ||
|
||
# copy the contents of /scripts/tdr/copy_from_tdr_to_gcs_hca to the WORKDIR | ||
COPY * . | ||
|
||
RUN pip install -r requirements.txt | ||
|
||
ENV PYTHONPATH "/scripts:${PYTHONPATH}" | ||
CMD ["/bin/bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Copy from TDR to GCS | ||
This was originally a bash script written by Samantha Velasquez\ | ||
[get_snapshot_files_and_transfer.sh](get_snapshot_files_and_transfer.sh) \ | ||
which was written to copy files from a TDR snapshot to an Azure bucket. \ | ||
Bobbie then translated to python using CoPilot and it ballooned from there. \ | ||
[copy_from_tdr_to_gcs.py](copy_from_tdr_to_gcs.py) \ | ||
The bash script is now just here for posterity as it previously only lived in Slack. | ||
It has not been tested in the Docker image created for the Python script. | ||
|
||
## Running the Script | ||
**IMPPORTANT**\ | ||
You will need to be in either the [Monster Group](https://groups.google.com/a/broadinstitute.org/g/monster) | ||
or the [Field Eng group](https://groups.google.com/a/broadinstitute.org/g/dsp-fieldeng) to run this script. | ||
|
||
You will want to clone the whole horsefish repo, if you have not done so already. | ||
|
||
You will also need a manifest file to run the script.\ | ||
The format of this manifest is identical to the one use for [HCA ingest](https://docs.google.com/document/d/1NQCDlvLgmkkveD4twX5KGv6SZUl8yaIBgz_l1EcrHdA/edit#heading=h.cg8d8o5kklql). | ||
A sample manifest is provided in the project directory - dcpTEST_manifest.csv.\ | ||
(Note that this is a test manifest and you will have to first load the data into TDR to use it - see the HCA ingest Ops manual linked above).\ | ||
It's probably easiest to copy out the rows from the original ingest manifest into a new manifest, | ||
then move that file into this project directory, so that it is picked up by compose. | ||
|
||
If you are not already logged in to gcloud/docker, you will need to do so before running the Docker compose command.\ | ||
`gcloud auth application-default login` \ | ||
`gcloud auth configure-docker us-east4-docker.pkg.dev` | ||
|
||
To start up the run/dev Docker compose env \ | ||
`docker compose run app bash`\ | ||
This will pull the latest image from Artifact Registry, start up the container, and mount the project dir, | ||
so changes in your local project dir will be reflected in the container. | ||
|
||
Next you will want to authenticate with gcloud using your Broad credentials.\ | ||
`gcloud auth login`\ | ||
`gcloud config set project dsp-fieldeng-dev`* \ | ||
`gcloud auth application-default login` \ | ||
If you are not in dsp-fieldeng-dev | ||
Then run the script using the following command syntax:\ | ||
`python3 copy_from_tdr_to_gcs_hca.py <manifest_file>'` | ||
|
||
Contact Field Eng for any issues that arise. \ | ||
_*or the monster hca prod project - mystical-slate-284720_ | ||
|
||
## Building the Docker Image | ||
The image builds with the GitHub Action "Main Validation and Release" ../.github/workflows/build-and-push_docker_copy_from_tdr_to_gcs_hca_main.yaml | ||
and ../.github/workflows/build-and-push_docker_copy_from_tdr_to_gcs_hca_dev.yaml | ||
tags = us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:$GITHUB_SHA, | ||
us-east4-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY/copy_from_tdr_to_gcs_hca:latest | ||
|
||
### To manually build and run locally | ||
`docker build -t us-east4-docker.pkg.dev/dsp-fieldeng-dev/horsefish/copy_from_tdr_to_gcs_hca:<new_version> .` \ | ||
`docker run --rm -it us-east4-docker.pkg.dev/dsp-fieldeng-dev/horsefish/copy_from_tdr_to_gcs_hca:<new_version>` | ||
|
||
### To build and push to Artifact Registry | ||
- make sure you are logged in to gcloud and that application default credentials are set \ | ||
`gcloud auth login` \ | ||
`gcloud config set project dsp-fieldeng-dev` \ | ||
`gcloud auth application-default login` | ||
- set the <new_version> before building and pushing \ | ||
`docker push us-east4-docker.pkg.dev/dsp-fieldeng-dev/horsefish/copy_from_tdr_to_gcs_hca:<new_version>` | ||
|
||
|
||
## Possible improvements* | ||
- update the script with conditional logic to accept a snapshot ID and destination instead | ||
- update the script check lower case institution against lower case institution keys - see ~line 86 | ||
- update the script to merge `validate_input()` and `_parse_csv()` into one function | ||
- Consider adding a copy manifest to this command, so instead you validating number of files copied (line 187), you can specifically highlight the files not copied successfully. | ||
|
||
*this is likely to be used only rarely and mostly by the author, as a stop gap until partial updates have been implemented. | ||
As such, we are attempting to keep this as light as possible, so as not to introduce unnecessary complexity. | ||
|
Oops, something went wrong.