Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env config not working with GCP Batch #5623

Open
nick-youngblut opened this issue Dec 21, 2024 · 4 comments
Open

env config not working with GCP Batch #5623

nick-youngblut opened this issue Dec 21, 2024 · 4 comments

Comments

@nick-youngblut
Copy link
Contributor

Bug report

My nextflow.config contains:

env {
    MY_ENV_VAR = "my_value"
}

When the executor is local, the process using MY_ENV_VAR via os.environ["MY_ENV_VAR"] runs successfully.
However, when the executor is google-batch, the process throws KeyError: 'MY_ENV_VAR'.

https://www.nextflow.io/docs/latest/reference/config.html#env does not state that env is not supported for google-batch, so I'm assuming that there is either a bug or a lack of docs.

Expected behavior and actual behavior

Environmental variables set via the env config scope should be available for GCP Batch jobs... or the docs should explicitly state that env is not supported for GCP Batch.

Steps to reproduce the problem

  • Set variables in the env scope
  • Run locally; processes using the env variable should succeed
  • Run on GCP Batch; processes using the env variable should fail

Environment

  • Nextflow version: 24.10.2
  • Java version: 21.0.0
  • Operating system: Ubuntu 22.04.5
  • Bash version: 5.1.16
@pditommaso
Copy link
Member

I'm unable to replicate this. Can you please provide a test case ?

@nick-youngblut
Copy link
Contributor Author

My reproducible example:

main.nf

workflow { 
    PRINT_ENV()   
    PRINT_ENV_PY()  
}

process PRINT_ENV {
    publishDir file(params.output_dir),  mode: "copy", overwrite: true
    container "ubuntu:latest"
    
    output:
    path "out.txt"

    script:
    """
    echo "$my_env_var" > out.txt
    """
}

process PRINT_ENV_PY {
    publishDir file(params.output_dir),  mode: "copy", overwrite: true
    container "python:3.11"
    
    output:
    path "out_py.txt"

    script:
    """
    test.py > out_py.txt
    """
}

nextflow.config

params {
    output_dir = "pipeline_output"
}

env {
    my_env_var = "TEST"
}

profiles {
    gcp {
        workDir            = "gs://"###/sandbox/work"
        fusion.enabled     = false
        wave.enabled       = false
        params.output_dir  = "gs://"###/sandbox/pipeline_output"
        process {
            executor       = "google-batch"
            errorStrategy  = "retry"
            maxRetries     = 2
            scratch        = true
        }
        google {
            project   = ""###"
            location  = ""###"
            batch {
                serviceAccountEmail = "###"
                spot                             = true
                maxSpotAttempts     = 3
            }
        }
    }
}

test.py

#!/usr/bin/env python3
import os

if __name__ == '__main__':
    print(os.getenv("my_env_var"))

Results

out.txt will always contain TEST when run locally and on GCP Batch. However, out_py.txt will contain TEST when run locally and None when run on GCP.

Notes

If the following is used:

process PRINT_ENV_PY {
    publishDir file(params.output_dir),  mode: "copy", overwrite: true
    container "python:3.11"
    
    output:
    path "out_py.txt"

    script:
    """
    export my_env_var="$my_env_var"
    test.py > out_py.txt
    """
}

Then, the env variable value ("TEST") is written out to out_py.txt. Importantly, exporting the env variable is only needed when running the pipeline on GCP Batch (versus local).

@bentsherman
Copy link
Member

Shouldn't the bash version be this?

    script:
    """
    echo "\$my_env_var" > out.txt
    """

If you don't escape the $ then it will use the global variable my_env_var, which can fall back to the environment variables in your config. Though I think we will move away from this pattern in favor of env('my_env_var'). Anyway, I think that is why your bash version and modified python version both work.

So the issue seems to be that the config env is not being added to the task environment with Google Batch. And indeed when I check the .command.run the config env vars are not there:

    set -u
    # config env should be here
    [[ $NXF_SCRATCH ]] && cd $NXF_SCRATCH
    export NXF_TASK_WORKDIR="$PWD"
    nxf_stage

@nick-youngblut
Copy link
Contributor Author

Yeah, that must be it. I'd have to go back and check on whether I just forgot the escape for $ in my example, but I'm guessing that I forgot to escape $ in my code, but found it to still work. I was more hung up on the differences between local and GCP Batch, in regards to the env variable handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants