Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated Initiator Group causes failure to re-mount PVCs after system reboot #69

Open
AgentK20 opened this issue Jan 11, 2025 · 1 comment
Labels
bug Something isn't working next release This will be closed in the next release

Comments

@AgentK20
Copy link

Hi there! After doing some maintenance on my TrueNAS server last week, when I brought it back up pods that were using my storageclass hung on CreatingContainer, complaining that the PVC was still being deleted. After looking into the logs I found there was a python error AttributeError: 'list' object has no attribute 'get' from this line. Originally I didn't think the logDebug flag would give me enough information to debug, so I forked and added a few lines of debug and ran the it again with logDebug, and discovered that somehow I had two duplicates of the exact same Initiator. Because of this, the object that gets returned by the call to api.fetch('iscsi/initiator', field='comment', value=content.get('host_uuid')) returns a List rather than a Dict, causing the error.

Firstly, I'm not sure how that Initiator got duplicated, since the helm chart was a stock install with no values.yaml, still on revision 1 (so there hadn't been two of the pods running at any point), but I was able to resolve it by deleting both of the Initiators (I had mistakenly thought that it was crud from me having fiddled around with something elsewhere on the device, not realizing that it was actually a key part of CSP) and the service recreated it on next startup.

Secondly, would it make sense to modify the logic that handles the api return to support a list response? If so I can send in a quick PR.

Relevant section of the debug log:

Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG API fetch caught 2 items
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG API Key detected. Will use token authentication.
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG TrueNAS GET request URI: iscsi/initiator
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG TrueNAS response: [
 {
  "id": 1,
  "initiators": [
   "iqn.1993-08.org.debian:01:4a3f49871177"
  ],
  "comment": "116dffff-815c-1c47-5e93-9085c7ecca7a"
 },
 {
  "id": 2,
  "initiators": [
   "iqn.1993-08.org.debian:01:4a3f49871177"
  ],
  "comment": "116dffff-815c-1c47-5e93-9085c7ecca7a"
 },
 {
  "id": 11,
  "initiators": [
   "iqn.1993-08.org.debian:01:4a3f49871177"
  ],
  "comment": "pvc-4ce59844-0a4c-4300-a11f-c196b567dd27"
 },
<...redacted for brevity>
 {
  "id": 18,
  "initiators": [
   "iqn.1993-08.org.debian:01:4a3f49871177"
  ],
  "comment": "pvc-afdb2626-bedd-4018-81e0-6dfaa1637880"
 }
]
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG Looking for field=comment and value=pvc-afdb2626-bedd-4018-81e0-6dfaa1637880
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG Looking for field=comment and value=pvc-afdb2626-bedd-4018-81e0-6dfaa1637880
<...redacted for brevity>
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG Looking for field=comment and value=pvc-afdb2626-bedd-4018-81e0-6dfaa1637880
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG API fetch caught 1 item
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG Host from backend: [{'id': 1, 'initiators': ['iqn.1993-08.org.debian:01:4a3f49871177'], 'comment': '116dffff-815c-1c47-5e93-9085c7ecca7a'}, {'id': 2, 'initiators': ['iqn.1993-08.org.debian:01:4a3f49871177'], 'comment': '116dffff-815c-1c47-5e93-9085c7ecca7a'}]
Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG Initiator from backend: {'id': 18, 'initiators': ['iqn.1993-08.org.debian:01:4a3f49871177'], 'comment': 'pvc-afdb2626-bedd-4018-81e0-6dfaa1637880'}
Sat, 11 Jan 2025 05:10:06 +0000 backend ERROR Exception during unpublish: Traceback (most recent call last):
  File "/app/truenascsp.py", line 69, in on_put
    api.logger.debug('Initiator host requested to be unpublished: %s', host.get('id'))
                                                                       ^^^^^^^^
AttributeError: 'list' object has no attribute 'get'

Sat, 11 Jan 2025 05:10:06 +0000 backend DEBUG Falcon Response (to HPE CSI): 500 Internal Server Error
@datamattsson datamattsson added bug Something isn't working next release This will be closed in the next release labels Jan 11, 2025
@datamattsson
Copy link
Collaborator

Thanks for reporting this. We've seen duplicate publish request come in from the Kubernetes control plane to the CSI driver. This is unexpected and we're solving this per CSP for now. I'll fix this in the next release. Expect this towards end of February.

The workaround is to simply delete the dupe (or both) initiator manually for now and restart your workload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working next release This will be closed in the next release
Projects
None yet
Development

No branches or pull requests

2 participants