Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad_csi_volume missing context block #417

Open
CarbonCollins opened this issue Jan 7, 2024 · 10 comments · May be fixed by #503
Open

nomad_csi_volume missing context block #417

CarbonCollins opened this issue Jan 7, 2024 · 10 comments · May be fixed by #503

Comments

@CarbonCollins
Copy link

Hi there,

Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

Terraform Version

Terraform v1.6.6
on linux_amd64

  • provider registry.terraform.io/hashicorp/nomad v2.1.0
  • provider registry.terraform.io/hashicorp/vault v3.14.0

Nomad Version

1.7.2

Provider Configuration

Which values are you setting in the provider configuration?

provider "nomad" {}

Environment Variables

Do you have any Nomad specific environment variable set in the machine running Terraform?

NOMAD_REGION=se
NOMAD_NAMESPACE=c3-networking
NOMAD_ADDR={REDACTED}
NOMAD_TOKEN={REDACTED}

Affected Resource(s)

Please list the resources as a list, for example:

  • nomad_csi_volume
  • nomad_csi_volume_registration

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

Expected Behavior

be able to define context block on the nomad_csi_volume resource like you can with the nomad_csi_volume_registration

Actual Behavior

You are unable to define the context block on the nomad_csi_volume resource

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

There is nothing atypical but when I tried to deploy my job to one of the volumes created and registered via the terraform provider my Nomad job wont start with the error:

failed to setup alloc: pre-run hook "csi_hook" failed: mounting volumes: rpc error: code = InvalidArgument desc = unknown/unsupported node_attach_driver: undefined

References

context docs in (csi_volume_registration - context)[https://registry.terraform.io/providers/hashicorp/nomad/latest/docs/resources/csi_volume_registration#context]

@CarbonCollins
Copy link
Author

Just as some extra info on this one, I tried crating the volume via a volume specification file with the same information that the nomad_csi_volume resource has and It successfully created without returning the unsupported node_attach_driver error.

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 29, 2024

Hi @CarbonCollins 👋

According to the docs volume context should only be set on volume registration, that's why it's only present in the resource_csi_volume_registration resource.

Could you share the volume spec you used to create the volume that worked?

@clarkbains
Copy link

clarkbains commented Jan 29, 2024

Likewise I also am used to specifying context within the volume declaration and then just using nomad volume create. No further registration is needed.
Example volume:

id = "nextclouddata"
name = "Next Cloud Data"
type = "csi"
plugin_id = "cephfs-csi"
capacity_min = "20M"
capacity_max = "20GiB"

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type = "ext4"
  mount_flags = ["noatime"]
}
secrets {
  adminID  = "admin"
  adminKey = ""<removed>"
  userID  = "nomad-nextcloud"
  userKey = ""<removed>"
}

parameters {
  clusterID = "<removed>"
  fsName = "proxmox-data"
}

context {
  monitors = "10.7.0.2"
  provisionVolume = "false"
  rootPath = "/nomad/nextcloud"
  mounter = "fuse"
}

However I was able to get the provider to accomplish what I need without my PR by connecting the volume and volume registration blocks. I did find some things had to be re-declared as well, though this could be my csi provider.

resource "nomad_csi_volume" "data_volume" {
  depends_on = [time_sleep.wait_5_seconds]
  lifecycle {
    prevent_destroy = false
  }

  plugin_id    = "cephfs-csi"
  volume_id    = local.sanitized_name
  name         = var.name
  capacity_min = var.max-size
  capacity_max = var.max-size

  capability {
    access_mode     = var.access_mode
    attachment_mode = "file-system"
  }

  mount_options {
    mount_flags = [ "noatime", "fsid=${var.ceph_creds.fsid}" ]
  }

  secrets = {
    adminID = var.ceph_creds.client
    adminKey = var.ceph_creds.token
  }

  parameters = {
    fsName = var.data_pool
    clusterID = var.ceph_creds.fsid
  }
  
}

resource "nomad_csi_volume_registration" "addCtx" {
  external_id = nomad_csi_volume.data_volume.external_id
  name = nomad_csi_volume.data_volume.name
  volume_id = nomad_csi_volume.data_volume.volume_id
  plugin_id = nomad_csi_volume.data_volume.plugin_id
    capability {
    access_mode     = var.access_mode
    attachment_mode = "file-system"
  }

  secrets = {
    adminID = var.ceph_creds.client
    adminKey = var.ceph_creds.token
  }

  context = {
      monitors = "${join(",",var.ceph_creds.monitors)}"
      provisionVolume = "false"
      rootPath = "nomad/${local.sanitized_name}"
      mounter = "kernel"
      pool = var.data_pool
  }
}

@CarbonCollins
Copy link
Author

Hi @CarbonCollins 👋

According to the docs volume context should only be set on volume registration, that's why it's only present in the resource_csi_volume_registration resource.

Could you share the volume spec you used to create the volume that worked?

Sure its pretty much the following

Terraform format that did not work:

resource "nomad_csi_volume" "loki_data" {
  count      = 1
  depends_on = [data.nomad_plugin.storage]

  lifecycle {
    prevent_destroy = true
  }

  namespace = "c3-monitoring"

  plugin_id    = data.nomad_plugin.storage.plugin_id
  volume_id    = format("loki-data[%d]", count.index)
  name         = format("loki-data[%d]", count.index)
  capacity_min = "1GiB"
  capacity_max = "2GiB"

  capability {
    access_mode     = "single-node-writer"
    attachment_mode = "file-system"
  }

  mount_options {
    fs_type = "cifs"
    mount_flags = [
      "vers=3.0",
      format("uid=%d", var.cifs_user_id),
      format("gid=%d", var.cifs_group_id),
      "file_mode=0600",
      "dir_mode=0700",
      "noperm",
      "nobrl",
      format("username=%s", data.vault_kv_secret_v2.volume_credentials.data["user"]),
      format("password=%s", data.vault_kv_secret_v2.volume_credentials.data["pass"])
    ]
  }
}

the nomad volume equivilant that did work:

id = "loki-data[0]"
name = "loki-data[0]"
type = "csi"
plugin_id = "soc-axion-smb"
capacity_min = "1GiB"
capacity_max = "2GiB"

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type = "cifs"
  mount_flags = [
      "vers=3.0",
      "uid={redacted}",
      "gid={redacted}",
      "file_mode=0600",
      "dir_mode=0700",
      "noperm",
      "nobrl",
      "username={redacted}",
      "password={redacted}",
  ]
}

context {
  node_attach_driver = "smb"
}

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 31, 2024

Thanks for the extra info @clarkbains and @CarbonCollins!

Do you happen to have plugin logs for the volume creation using Terraform (so the one without context)?

The context should've been set automatically by the plugin on volume creation, and so if you're registering an existing volume you would need to match it. So looking at the CSI plugin logs could shed some light into what's going here.

@CarbonCollins
Copy link
Author

I dont have any of the previous logs, but I could try generating some new ones with a new volume

@CarbonCollins
Copy link
Author

Terraform apply log
Terraform will perform the following actions:

  # module.soc_volumes.nomad_csi_volume.tempo_data[0] will be created
  + resource "nomad_csi_volume" "tempo_data" {
      + capacity                = (known after apply)
      + capacity_max            = "2.0 GiB"
      + capacity_max_bytes      = (known after apply)
      + capacity_min            = "1.0 GiB"
      + capacity_min_bytes      = (known after apply)
      + controller_required     = (known after apply)
      + controllers_expected    = (known after apply)
      + controllers_healthy     = (known after apply)
      + external_id             = (known after apply)
      + id                      = (known after apply)
      + name                    = "tempo-data[0]"
      + namespace               = "c3-monitoring"
      + nodes_expected          = (known after apply)
      + nodes_healthy           = (known after apply)
      + plugin_id               = "soc-axion-smb"
      + plugin_provider         = (known after apply)
      + plugin_provider_version = (known after apply)
      + schedulable             = (known after apply)
      + topologies              = (known after apply)
      + volume_id               = "tempo-data[0]"

      + capability {
          + access_mode     = "single-node-writer"
          + attachment_mode = "file-system"
        }

      + mount_options {
          + fs_type     = "cifs"
          + mount_flags = [
              + "vers=3.0",
              + "uid={redacted}",
              + "gid={redacted}",
              + "file_mode=0664",
              + "dir_mode=0775",
              + "noperm",
              + "nobrl",
              + (sensitive value),
              + (sensitive value),
            ]
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.
2024-01-31T21:00:06.792+0100 [DEBUG] command: asking for input: "\nDo you want to perform these actions?"

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

2024-01-31T21:00:09.216+0100 [INFO]  backend/local: apply calling Apply
2024-01-31T21:00:09.216+0100 [DEBUG] Building and walking apply graph for NormalMode plan
2024-01-31T21:00:09.217+0100 [DEBUG] Resource state not found for node "module.soc_volumes.nomad_csi_volume.tempo_data[0]", instance module.soc_volumes.nomad_csi_volume.tempo_data[0]
2024-01-31T21:00:09.219+0100 [DEBUG] adding implicit provider configuration provider["registry.terraform.io/hashicorp/vault"], implied first by module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand)
2024-01-31T21:00:09.219+0100 [DEBUG] ProviderTransformer: "module.soc_volumes.nomad_csi_volume.tempo_data (expand)" (*terraform.nodeExpandApplyableResource) needs provider["registry.terraform.io/hashicorp/nomad"]
2024-01-31T21:00:09.219+0100 [DEBUG] ProviderTransformer: "module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand)" (*terraform.nodeExpandApplyableResource) needs provider["registry.terraform.io/hashicorp/vault"]
2024-01-31T21:00:09.219+0100 [DEBUG] ProviderTransformer: "module.intentions.consul_config_entry.tempo_intention (expand)" (*terraform.nodeExpandApplyableResource) needs provider["registry.terraform.io/hashicorp/consul"]
2024-01-31T21:00:09.219+0100 [DEBUG] ProviderTransformer: "module.intentions.consul_config_entry.tempo_zipkin_intention (expand)" (*terraform.nodeExpandApplyableResource) needs provider["registry.terraform.io/hashicorp/consul"]
2024-01-31T21:00:09.219+0100 [DEBUG] ProviderTransformer: "module.soc_volumes.data.nomad_plugin.storage (expand)" (*terraform.nodeExpandApplyableResource) needs provider["registry.terraform.io/hashicorp/nomad"]
2024-01-31T21:00:09.219+0100 [DEBUG] ProviderTransformer: "module.soc_volumes.nomad_csi_volume.tempo_data[0]" (*terraform.NodeApplyableResourceInstance) needs provider["registry.terraform.io/hashicorp/nomad"]
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.var.plugin_id (expand)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.var.cifs_user_id (expand)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.intentions (expand)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes (close)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.intentions.consul_config_entry.tempo_intention (expand)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.intentions.consul_config_entry.tempo_zipkin_intention (expand)" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "provider[\"registry.terraform.io/hashicorp/nomad\"]" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "provider[\"registry.terraform.io/hashicorp/vault\"]" references: []
2024-01-31T21:00:09.221+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes (expand)" references: []
2024-01-31T21:00:09.222+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.nomad_csi_volume.tempo_data (expand)" references: [module.soc_volumes.data.nomad_plugin.storage (expand) module.soc_volumes.data.nomad_plugin.storage (expand) module.soc_volumes.var.cifs_user_id (expand) module.soc_volumes.var.cifs_group_id (expand) module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand) module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand)]
2024-01-31T21:00:09.222+0100 [DEBUG] ReferenceTransformer: "provider[\"registry.terraform.io/hashicorp/consul\"]" references: []
2024-01-31T21:00:09.222+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.data.nomad_plugin.storage (expand)" references: [module.soc_volumes.var.plugin_id (expand)]
2024-01-31T21:00:09.222+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.var.cifs_group_id (expand)" references: []
2024-01-31T21:00:09.222+0100 [DEBUG] ReferenceTransformer: "module.soc_volumes.nomad_csi_volume.tempo_data[0]" references: [module.soc_volumes.data.nomad_plugin.storage (expand) module.soc_volumes.data.nomad_plugin.storage (expand) module.soc_volumes.var.cifs_user_id (expand) module.soc_volumes.var.cifs_group_id (expand) module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand) module.soc_volumes.data.vault_kv_secret_v2.volume_credentials (expand)]
2024-01-31T21:00:09.222+0100 [DEBUG] ReferenceTransformer: "module.intentions (close)" references: []
2024-01-31T21:00:09.225+0100 [DEBUG] pruneUnusedNodes: module.intentions.consul_config_entry.tempo_intention (expand) is no longer needed, removing
2024-01-31T21:00:09.225+0100 [DEBUG] pruneUnusedNodes: module.intentions.consul_config_entry.tempo_zipkin_intention (expand) is no longer needed, removing
2024-01-31T21:00:09.225+0100 [DEBUG] pruneUnusedNodes: module.intentions (expand) is no longer needed, removing
2024-01-31T21:00:09.225+0100 [DEBUG] pruneUnusedNodes: provider["registry.terraform.io/hashicorp/consul"] is no longer needed, removing
2024-01-31T21:00:09.228+0100 [DEBUG] Starting graph walk: walkApply
2024-01-31T21:00:09.229+0100 [DEBUG] created provider logger: level=debug
2024-01-31T21:00:09.229+0100 [INFO]  provider: configuring client automatic mTLS
2024-01-31T21:00:09.258+0100 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/vault/3.24.0/linux_amd64/terraform-provider-vault_v3.24.0_x5 args=[".terraform/providers/registry.terraform.io/hashicorp/vault/3.24.0/linux_amd64/terraform-provider-vault_v3.24.0_x5"]
2024-01-31T21:00:09.259+0100 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/vault/3.24.0/linux_amd64/terraform-provider-vault_v3.24.0_x5 pid=570881
2024-01-31T21:00:09.259+0100 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/vault/3.24.0/linux_amd64/terraform-provider-vault_v3.24.0_x5
2024-01-31T21:00:09.350+0100 [INFO]  provider.terraform-provider-vault_v3.24.0_x5: configuring server automatic mTLS: timestamp="2024-01-31T21:00:09.349+0100"
2024-01-31T21:00:09.369+0100 [DEBUG] provider: using plugin: version=5
2024-01-31T21:00:09.369+0100 [DEBUG] provider.terraform-provider-vault_v3.24.0_x5: plugin address: address=/tmp/plugin1466548974 network=unix timestamp="2024-01-31T21:00:09.369+0100"
2024-01-31T21:00:09.386+0100 [DEBUG] created provider logger: level=debug
2024-01-31T21:00:09.387+0100 [INFO]  provider: configuring client automatic mTLS
2024-01-31T21:00:09.399+0100 [WARN]  ValidateProviderConfig from "provider[\"registry.terraform.io/hashicorp/vault\"]" changed the config value, but that value is unused
2024-01-31T21:00:09.399+0100 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/nomad/2.1.0/linux_amd64/terraform-provider-nomad_v2.1.0_x5 args=[".terraform/providers/registry.terraform.io/hashicorp/nomad/2.1.0/linux_amd64/terraform-provider-nomad_v2.1.0_x5"]
2024-01-31T21:00:09.399+0100 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/nomad/2.1.0/linux_amd64/terraform-provider-nomad_v2.1.0_x5 pid=570893
2024-01-31T21:00:09.399+0100 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/nomad/2.1.0/linux_amd64/terraform-provider-nomad_v2.1.0_x5
2024-01-31T21:00:09.410+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: configuring server automatic mTLS: timestamp="2024-01-31T21:00:09.410+0100"
2024-01-31T21:00:09.431+0100 [DEBUG] provider: using plugin: version=5
2024-01-31T21:00:09.431+0100 [DEBUG] provider.terraform-provider-nomad_v2.1.0_x5: plugin address: address=/tmp/plugin1420660728 network=unix timestamp="2024-01-31T21:00:09.431+0100"
2024-01-31T21:00:09.452+0100 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-01-31T21:00:09.458+0100 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/vault/3.24.0/linux_amd64/terraform-provider-vault_v3.24.0_x5 pid=570881
2024-01-31T21:00:09.458+0100 [DEBUG] provider: plugin exited
2024-01-31T21:00:09.468+0100 [WARN]  ValidateProviderConfig from "provider[\"registry.terraform.io/hashicorp/nomad\"]" changed the config value, but that value is unused
2024-01-31T21:00:09.478+0100 [WARN]  Provider "registry.terraform.io/hashicorp/nomad" produced an invalid plan for module.soc_volumes.nomad_csi_volume.tempo_data[0], but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .capacity_max: planned value cty.StringVal("2.0 GiB") does not match config value cty.StringVal("2GiB")
      - .capacity_min: planned value cty.StringVal("1.0 GiB") does not match config value cty.StringVal("1GiB")
module.soc_volumes.nomad_csi_volume.tempo_data[0]: Creating...
2024-01-31T21:00:09.479+0100 [INFO]  Starting apply for module.soc_volumes.nomad_csi_volume.tempo_data[0]
2024-01-31T21:00:09.480+0100 [DEBUG] module.soc_volumes.nomad_csi_volume.tempo_data[0]: applying the planned Create change
2024-01-31T21:00:09.484+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: 2024/01/31 21:00:09 [DEBUG] setting computed for "topologies" from ComputedKeys: timestamp="2024-01-31T21:00:09.484+0100"
2024-01-31T21:00:09.484+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: 2024/01/31 21:00:09 [DEBUG] creating CSI volume "tempo-data[0]" in namespace "": timestamp="2024-01-31T21:00:09.484+0100"
2024-01-31T21:00:09.484+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: 2024/01/31 21:00:09 [DEBUG] Waiting for state to become: [success]: timestamp="2024-01-31T21:00:09.484+0100"
2024-01-31T21:00:09.674+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: 2024/01/31 21:00:09 [DEBUG] CSI volume "tempo-data[0]" created in namespace "": timestamp="2024-01-31T21:00:09.674+0100"
2024-01-31T21:00:09.674+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: 2024/01/31 21:00:09 [DEBUG] reading information for CSI volume "tempo-data[0]" in namespace "c3-monitoring": timestamp="2024-01-31T21:00:09.674+0100"
2024-01-31T21:00:09.686+0100 [INFO]  provider.terraform-provider-nomad_v2.1.0_x5: 2024/01/31 21:00:09 [DEBUG] found CSI volume "tempo-data[0]" in namespace "c3-monitoring": timestamp="2024-01-31T21:00:09.685+0100"
module.soc_volumes.nomad_csi_volume.tempo_data[0]: Creation complete after 1s [id=tempo-data[0]]
2024-01-31T21:00:09.694+0100 [DEBUG] State storage *remote.State declined to persist a state snapshot
2024-01-31T21:00:09.696+0100 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-01-31T21:00:09.701+0100 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/nomad/2.1.0/linux_amd64/terraform-provider-nomad_v2.1.0_x5 pid=570893
2024-01-31T21:00:09.701+0100 [DEBUG] provider: plugin exited
2024-01-31T21:00:09.702+0100 [DEBUG] states/remote: state read serial is: 5; serial is: 5
2024-01-31T21:00:09.702+0100 [DEBUG] states/remote: state read lineage is: 8c27428e-cb98-acc6-d77b-7270b9794b11; lineage is: 8c27428e-cb98-acc6-d77b-7270b9794b11

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 31, 2024

Thanks! So the volume is created successfully but you can't mount it to a task?

If you read the volume back (like using nomad operator api /v1/volume/csi/:volume_id, it doesn't seem like the nomad volume status command prints context data) does it have a context value?

And when the error you reported actually happens, do you have see anything in the CSI plugin logs? (meaning, the CSI plugin job allocations you ran).

@CarbonCollins
Copy link
Author

Just checked the nomad operator api call and it seems to have populated the context correctly here...

I see the "node_attach_driver": "smb"... I have not deployed a job to use the volume yet to verify that it works on mounting but at least from the api call it seems to have been set correctly here at least

@tgross tgross linked a pull request Jan 16, 2025 that will close this issue
@tgross
Copy link
Member

tgross commented Jan 16, 2025

I've got a draft PR fixing this #503. Need to do some testing yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

4 participants