Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS export policy is missing one k8s node's IP #965

Open
ptrkmkslv opened this issue Jan 21, 2025 · 13 comments
Open

NFS export policy is missing one k8s node's IP #965

ptrkmkslv opened this issue Jan 21, 2025 · 13 comments
Labels

Comments

@ptrkmkslv
Copy link

We faced the bug when one of the cluster nodes is unable to mount PVC (NFS) with error: 



71s Warning FailedMount pod/grafana-5b7b4f4dc7-9r4zf MountVolume.SetUp failed for volume "pvc-4c0caf5e-ac71-4dc2-a9d1-329913b244a6" : rpc error: code = Internal desc = error mounting NFS volume x.x.x.x/trident_pvc_4c0caf5e_ac71_4dc2_a9d1_329913b244a6 on mountpoint /opt/rke/var/lib/kubelet/pods/cd1b1ad1-f380-4fb2-9e9b-eff4806121b4/volumes/kubernetes.io~csi/pvc-4c0caf5e-ac71-4dc2-a9d1-329913b244a6/mount: exit status 32

After investigation we have discovered that NFS export policy on SVM is missing this node’s IP (policy had 16 entries where cluster consists of 17 nodes).

trident-node / trident-controller did not produce any useful error messages regarding ‘publishing’ volume to the node.
SVM also did not complain about any problem.

Issue was manually resolved by storage team - missing node was manually added to export policy - after that POD was immediately able to mount PVC

Environment
kind: tridentbackendconfigs.trident.netapp.io for NFS share is using both parameters: autoExportPolicy: true
and
autoExportCIDRs: with subnet class /24 where k8s storage interfaces are

  • Trident version: v24.10
  • Kubernetes version: v1.30.6
  • Container runtime: docker://26.1.0
  • Kubernetes orchestrator: Rancher (custom cluster)
  • OS: Flatcar Container Linux by Kinvolk 4081.2.0
  • NetApp backend types: ONTAP AFF (ONTAP 9.12.1P12)

Expected behavior

Complete export list of all k8s worker nodes

Any advice ? what to do if the problem occurs again ? (any tshooting commands that can be used ?)

@ptrkmkslv ptrkmkslv added the bug label Jan 21, 2025
@enneitex
Copy link

Hi, same issue here:

Trident version: v24.10
Kubernetes version: v1.29.10
Container runtime: containerd
Kubernetes orchestrator: Kubeadm
OS: RHEL9
NetApp backend types: ONTAP NAS

A few IP are missing from the export policy while using autoExportPolicy: true and default autoExportCIDRs
Even after deleting the node and adding it back to the kubernetes cluster, its IP is still missing.

@ptrkmkslv
Copy link
Author

we are almost certain that the problem is related to 24.10 - after analysis, the first problem related to NFS occurred the day after the upgrade (upgrade from version 24.02)

@enneitex
Copy link

Same, we were using 24.06.1 before and never hit this issue while removing and adding a lot of nodes in our clusters.

@ptrkmkslv
Copy link
Author

in our environment clusters have fixed number of nodes (mostly) - so it is even more strange that suddenly export policy does not include all nodes... so it is not a problem of adding/removing nodes in dynamic clusters

@wonderland
Copy link

Just to see if it can be broken down more specifically: Are you using driver name ontap-nas or ontap-nas-economy?

@ptrkmkslv
Copy link
Author

storageDriverName: ontap-nas

@enneitex
Copy link

Same, ontap-nas

@torirevilla
Copy link
Contributor

How long are you waiting after the node is added to check if the new node IP is included in the export policy?
In trying to reproduce this issue, I have found that the node IP is added to the export policy during the reconcileNodeAccess loop and is complete soon after adding a node but not immediately.

I have a few more questions to better reproduce the issue:
When are you adding the node, before or after upgrading to 24.10?
How are you performing the upgrade, using the operator or tridentctl cli?

@enneitex
Copy link

Hi,
Event after a few days, some IPs are still missing.

We added a lot of nodes in our clusters when running previous versions of trident and never faced this issue.
Now It is happening with nodes added after we upgraded trident with the operator.
We opened a case: 2010276594

@ptrkmkslv
Copy link
Author

In our cases we have static number of nodes - and after upgrade export policies are missing some IPs

@ptrkmkslv
Copy link
Author

Just to clarify:

I have a few more questions to better reproduce the issue: When are you adding the node, before or after upgrading to 24.10? How are you performing the upgrade, using the operator or tridentctl cli?

regarding upgrade: we are using operator installed by helm: https://netapp.github.io/trident-helm-chart
regarding adding node: in our case, clusters has static number of the nodes and were not scaled-up / decreased when the issue occured

@tijmenvandenbrink
Copy link

Any update from NetApp on this issue?

@sjpeeris
Copy link
Collaborator

Hi @ptrkmkslv Can you please open a NetApp Support case for this issue. Our support team can work with you on collecting the required logs and data to investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants