Skip to content

Commit

Permalink
Remove Python package version pins.
Browse files Browse the repository at this point in the history
Also:
* added package `plotnine` per DataBiosphere#126
* replaced package `tensorflow` with `tensorflow_cpu` to get rid of the warnings about GPUs being unavailable for Terra Cloud Runtimes
* added package `google-resumable-media` as an explicit dependency to ensure a more recent version of it is used, pandas-gbq depends on it for table uploads
* `--use_rest_api` flag is now needed for `%%bigquery magic`
  * As of release [google-cloud-bigquery 1.26.0 (2020-07-20)](https://github.com/googleapis/python-bigquery/blob/master/CHANGELOG.md#1260-2020-07-20) the BigQuery Python client uses the BigQuery Storage client by default.
  * This currently causes an error on Terra Cloud Runtimes `the user does not have 'bigquery.readsessions.create' permission for '<Terra billing project id>'`.
  * To work-around this must uninstall the dependency `google-cloud-bigquery-storage` so that flag `--use_rest_api` can be used with `%%bigquery` to use the older, slower mechanism for data transfer.
  • Loading branch information
deflaux committed Mar 5, 2021
1 parent a86cb30 commit 47c1820
Show file tree
Hide file tree
Showing 10 changed files with 541 additions and 82 deletions.
14 changes: 12 additions & 2 deletions .github/workflows/test-terra-jupyter-aou.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,18 @@ on:
paths:
- 'terra-jupyter-aou/**'
- '.github/workflows/test-terra-jupyter-aou.yml'
# Note: secrets are not passed to pull requests from forks, so the dev team will need to use the manual workflow
# dispatch trigger when receiving community contributions.

push:
# Note: GitHub secrets are not passed to pull requests from forks. For community contributions from
# regular contributors, its a good idea for the contributor to configure the GitHub actions to run correctly
# in their fork as described above.
#
# For occasional contributors, the dev team will merge the PR fork branch to a branch in upstream named
# test-community-contribution-<PR#> to run all the GitHub Action smoke tests.
branches: [ 'test-community-contribution*' ]
paths:
- 'terra-jupyter-aou/**'
- '.github/workflows/test-terra-jupyter-aou.yml'

workflow_dispatch:
# Allows manually triggering of workflow on a selected branch via the GitHub Actions tab.
Expand Down
120 changes: 120 additions & 0 deletions .github/workflows/test-terra-jupyter-gatk.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: Test terra-jupyter-gatk
# Perform smoke tests on the terra-jupyter-gatk Docker image to have some amount of confidence that
# Python package versions are compatible.
#
# To configure the minimal auth needed for these tests to be able to read public data from Google Cloud Platform:
# Step 1: Create a service account per these instructions:
# https://github.com/google-github-actions/setup-gcloud/blob/master/setup-gcloud/README.md
# Step 2: Give the service account the following permissions within the project: BigQuery User
# Step 3: Store its key and project id as GitHub repository secrets GCP_SA_KEY and GCP_PROJECT_ID.
# https://docs.github.com/en/free-pro-team@latest/actions/reference/encrypted-secrets#creating-encrypted-secrets-for-a-repository

on:
pull_request:
branches: [ master ]
paths:
- 'terra-jupyter-gatk/**'
- '.github/workflows/test-terra-jupyter-gatk.yml'

push:
# Note: GitHub secrets are not passed to pull requests from forks. For community contributions from
# regular contributors, its a good idea for the contributor to configure the GitHub actions to run correctly
# in their fork as described above.
#
# For occasional contributors, the dev team will merge the PR fork branch to a branch in upstream named
# test-community-contribution-<PR#> to run all the GitHub Action smoke tests.
branches: [ 'test-community-contribution*' ]
paths:
- 'terra-jupyter-gatk/**'
- '.github/workflows/test-terra-jupyter-gatk.yml'

push:
# TODO(deflaux) remove these 'push' triggers after testing is complete.
branches: [ add-more-smoke-tests, update-python-versions, bump-aou-versions ]

workflow_dispatch:
# Allows manually triggering of workflow on a selected branch via the GitHub Actions tab.
# GitHub blog demo: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/.

env:
GOOGLE_PROJECT: ${{ secrets.GCP_PROJECT_ID }}

jobs:

test_docker_image:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.7

- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@master
with:
project_id: ${{ secrets.GCP_PROJECT_ID }}
service_account_key: ${{ secrets.GCP_SA_KEY }}
export_default_credentials: true

- name: Build Docker image and base images too, if needed
run: |
gcloud auth configure-docker
./build_smoke_test_image.sh terra-jupyter-gatk
- name: Run Python code specific to notebooks with nbconvert
# Run all notebooks from start to finish, regardles of error, so that we can capture the
# result as a workflow artifact.
# See also https://github.com/marketplace/actions/run-notebook if a more complicated
# workflow for notebooks is needed in the future.
run: |
chmod a+w -R $GITHUB_WORKSPACE
docker run \
--env GOOGLE_PROJECT \
--volume "${{ env.GOOGLE_APPLICATION_CREDENTIALS }}:/tmp/credentials.json:ro" \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE:/tests \
--workdir=/tests \
--entrypoint="" \
terra-jupyter-gatk:smoke-test \
/bin/bash -c 'for nb in {terra-jupyter-python/tests,terra-jupyter-gatk/tests}/*ipynb ; do jupyter nbconvert --to html --ExecutePreprocessor.allow_errors=True --execute "${nb}" ; done'
- name: Upload workflow artifacts
uses: actions/upload-artifact@v2
with:
name: notebook-execution-results
path: |
terra-jupyter-python/tests/*.html
terra-jupyter-gatk/tests/*.html
retention-days: 30

- name: Test Python code with pytest
run: |
docker run \
--env GOOGLE_PROJECT \
--volume "${{ env.GOOGLE_APPLICATION_CREDENTIALS }}:/tmp/credentials.json:ro" \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE:/tests \
--workdir=/tests \
--entrypoint="" \
terra-jupyter-gatk:smoke-test \
/bin/bash -c 'pip3 install pytest ; pytest terra-jupyter-python/tests/ terra-jupyter-gatk/tests/'
- name: Test Python code specific to notebooks with nbconvert
# Simply 'Cell -> Run All` these notebooks and expect no errors in the case of a successful run of the test suite.
# If the tests throw any exceptions, execution of the notebooks will halt at that point. Look at the workflow
# artifacts to understand if there are more failures than just the one that caused this task to halt.
run: |
docker run \
--env GOOGLE_PROJECT \
--volume "${{ env.GOOGLE_APPLICATION_CREDENTIALS }}:/tmp/credentials.json:ro" \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE:/tests \
--workdir=/tests \
--entrypoint="" \
terra-jupyter-gatk:smoke-test \
/bin/bash -c 'for nb in {terra-jupyter-python/tests,terra-jupyter-gatk/tests}/*ipynb ; do jupyter nbconvert --to html --execute "${nb}" ; done'
120 changes: 120 additions & 0 deletions .github/workflows/test-terra-jupyter-hail.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: Test terra-jupyter-hail
# Perform smoke tests on the terra-jupyter-hail Docker image to have some amount of confidence that
# Python package versions are compatible.
#
# To configure the minimal auth needed for these tests to be able to read public data from Google Cloud Platform:
# Step 1: Create a service account per these instructions:
# https://github.com/google-github-actions/setup-gcloud/blob/master/setup-gcloud/README.md
# Step 2: Give the service account the following permissions within the project: BigQuery User
# Step 3: Store its key and project id as GitHub repository secrets GCP_SA_KEY and GCP_PROJECT_ID.
# https://docs.github.com/en/free-pro-team@latest/actions/reference/encrypted-secrets#creating-encrypted-secrets-for-a-repository

on:
pull_request:
branches: [ master ]
paths:
- 'terra-jupyter-hail/**'
- '.github/workflows/test-terra-jupyter-hail.yml'

push:
# Note: GitHub secrets are not passed to pull requests from forks. For community contributions from
# regular contributors, its a good idea for the contributor to configure the GitHub actions to run correctly
# in their fork as described above.
#
# For occasional contributors, the dev team will merge the PR fork branch to a branch in upstream named
# test-community-contribution-<PR#> to run all the GitHub Action smoke tests.
branches: [ 'test-community-contribution*' ]
paths:
- 'terra-jupyter-hail/**'
- '.github/workflows/test-terra-jupyter-hail.yml'

push:
# TODO(deflaux) remove these 'push' triggers after testing is complete.
branches: [ add-more-smoke-tests, update-python-versions, bump-aou-versions ]

workflow_dispatch:
# Allows manually triggering of workflow on a selected branch via the GitHub Actions tab.
# GitHub blog demo: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/.

env:
GOOGLE_PROJECT: ${{ secrets.GCP_PROJECT_ID }}

jobs:

test_docker_image:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.7

- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@master
with:
project_id: ${{ secrets.GCP_PROJECT_ID }}
service_account_key: ${{ secrets.GCP_SA_KEY }}
export_default_credentials: true

- name: Build Docker image and base images too, if needed
run: |
gcloud auth configure-docker
./build_smoke_test_image.sh terra-jupyter-hail
- name: Run Python code specific to notebooks with nbconvert
# Run all notebooks from start to finish, regardles of error, so that we can capture the
# result as a workflow artifact.
# See also https://github.com/marketplace/actions/run-notebook if a more complicated
# workflow for notebooks is needed in the future.
run: |
chmod a+w -R $GITHUB_WORKSPACE
docker run \
--env GOOGLE_PROJECT \
--volume "${{ env.GOOGLE_APPLICATION_CREDENTIALS }}:/tmp/credentials.json:ro" \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE:/tests \
--workdir=/tests \
--entrypoint="" \
terra-jupyter-hail:smoke-test \
/bin/bash -c 'for nb in {terra-jupyter-python/tests,terra-jupyter-hail/tests}/*ipynb ; do jupyter nbconvert --to html --ExecutePreprocessor.allow_errors=True --execute "${nb}" ; done'
- name: Upload workflow artifacts
uses: actions/upload-artifact@v2
with:
name: notebook-execution-results
path: |
terra-jupyter-python/tests/*.html
terra-jupyter-hail/tests/*.html
retention-days: 30

- name: Test Python code with pytest
run: |
docker run \
--env GOOGLE_PROJECT \
--volume "${{ env.GOOGLE_APPLICATION_CREDENTIALS }}:/tmp/credentials.json:ro" \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE:/tests \
--workdir=/tests \
--entrypoint="" \
terra-jupyter-hail:smoke-test \
/bin/bash -c 'pip3 install pytest ; pytest terra-jupyter-python/tests/ terra-jupyter-hail/tests/'
- name: Test Python code specific to notebooks with nbconvert
# Simply 'Cell -> Run All` these notebooks and expect no errors in the case of a successful run of the test suite.
# If the tests throw any exceptions, execution of the notebooks will halt at that point. Look at the workflow
# artifacts to understand if there are more failures than just the one that caused this task to halt.
run: |
docker run \
--env GOOGLE_PROJECT \
--volume "${{ env.GOOGLE_APPLICATION_CREDENTIALS }}:/tmp/credentials.json:ro" \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE:/tests \
--workdir=/tests \
--entrypoint="" \
terra-jupyter-hail:smoke-test \
/bin/bash -c 'for nb in {terra-jupyter-python/tests,terra-jupyter-hail/tests}/*ipynb ; do jupyter nbconvert --to html --execute "${nb}" ; done'
14 changes: 12 additions & 2 deletions .github/workflows/test-terra-jupyter-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,18 @@ on:
paths:
- 'terra-jupyter-python/**'
- '.github/workflows/test-terra-jupyter-python.yml'
# Note: secrets are not passed to pull requests from forks, so the dev team will need to use the manual workflow
# dispatch trigger when receiving community contributions.

push:
# Note: GitHub secrets are not passed to pull requests from forks. For community contributions from
# regular contributors, its a good idea for the contributor to configure the GitHub actions to run correctly
# in their fork as described above.
#
# For occasional contributors, the dev team will merge the PR fork branch to a branch in upstream named
# test-community-contribution-<PR#> to run all the GitHub Action smoke tests.
branches: [ 'test-community-contribution*' ]
paths:
- 'terra-jupyter-python/**'
- '.github/workflows/test-terra-jupyter-python.yml'

workflow_dispatch:
# Allows manually triggering of workflow on a selected branch via the GitHub Actions tab.
Expand Down
7 changes: 1 addition & 6 deletions terra-jupyter-aou/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-python:0.0.23 AS python
FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-python:0.0.24 AS python

FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:1.0.13

Expand Down Expand Up @@ -85,9 +85,4 @@ ENV USER jupyter-user
USER $USER

RUN pip3 install --upgrade \
pandas-profiling==2.10.1 \
plotnine==0.7.1 \
# Parent image pins tensorflow to an old alpha version. Override here for now.
tensorflow==2.3.0 \
numpy==1.18.5 \
"git+git://github.com/all-of-us/workbench-snippets.git#egg=terra_widgets&subdirectory=py"
95 changes: 95 additions & 0 deletions terra-jupyter-gatk/tests/gatk_smoke_test.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test cases requiring or benefiting from the context of a notebook\n",
"\n",
"If the notebook runs successfully from start to finish, the test is successful!\n",
"\n",
"TODO(all): Add additional tests and/or tests with particular assertions, as we encounter Python package version incompatibilities not currently detected by these tests.\n",
"\n",
"In general, only add test cases here that require the context of a notebook. This is because this notebook, as currently written, will abort at the **first** failure. Compare this to a proper test suite where all cases are run, giving much more information about the full extent of any problems encountered."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Package versions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip3 freeze"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip3 show plotnine pandas google-cloud-storage google-cloud-bigquery google-resumable-media"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test cases requiring the context of a notebook "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test cases benefiting from the context of a notebook "
]
}
],
"metadata": {
"environment": {
"name": "r-cpu.3-6.m56",
"type": "gcloud",
"uri": "gcr.io/deeplearning-platform-release/r-cpu.3-6:m56"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit 47c1820

Please sign in to comment.