You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the operator's service account has limited access to the Kubernetes cluster (such as an RBAC that only gives it access to the current namespace), the watchers may die (such as due to a temporary auth issue) and never recover. This results in the operator continuing to run, but not monitoring resources for changes anymore. This appears to only happen for operators that are handling custom resources.
# monkeypatch the errors.check_response method so we can simulate when an auth error occurs via# pkill -SIGUSR1 -nf kopfimportloggingimportkopffromkopf._cogs.clientsimporterrorslogger=logging.getLogger(__name__)
old_check_response=errors.check_responseBROKEN_AUTH=Falsedefcheck_response(*args, **kwargs):
logger.info("Running monkey patched checked response")
ifBROKEN_AUTH:
logger.info("Auth is broken, raising error.")
raiseerrors.APIUnauthorizedError(None, status=401)
returnold_check_response(*args, **kwargs)
errors.check_response=check_responseimportsignaldefbreak_auth(*_):
globalBROKEN_AUTHlogger.info("Breaking auth")
BROKEN_AUTH=Truesignal.signal(signal.SIGUSR1, break_auth)
@kopf.on.update(CR_GROUP,CR_VERSION,CR_KIND,)@kopf.on.create(CR_GROUP,CR_VERSION,CR_KIND,)defmonitor_custom_resource(
name: str,
namespace: str,
status: kopf.Status,
labels: kopf.Labels,
**_,
): ...
Logs
[2024-12-10 15:43:22,060] kopf._core.engines.a [INFO ] Initial authentication has been initiated.
[2024-12-10 15:43:22,070] kopf.activities.auth [INFO ] Activity 'login_via_client' succeeded.
[2024-12-10 15:43:22,070] kopf._core.engines.a [INFO ] Initial authentication has finished.
[2024-12-10 15:43:22,080] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,081] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,083] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,084] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,087] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,088] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,091] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,091] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,091] kopf._core.reactor.o [WARNING ] Not enough permissions to list namespaces. Falling back to a list of namespaces which are assumed to exist: {'default'}
[2024-12-10 15:43:22,093] kopf._core.reactor.o [WARNING ] Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed; the resources are only refreshed on operator restarts.
[2024-12-10 15:43:22,094] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,094] kopf._core.reactor.o [WARNING ] Not enough permissions to watch for namespaces: changes (deletion/creation) will not be noticed; the namespaces are only refreshed on operator restarts.
[2024-12-10 15:43:22,115] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:22,126] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:43,996] __kopf_script_0__/Us [INFO ] Breaking auth
[2024-12-10 15:43:48,157] __kopf_script_0__/Us [INFO ] Running monkey patched checked response
[2024-12-10 15:43:48,157] __kopf_script_0__/Us [INFO ] Auth is broken, raising error.
[2024-12-10 15:43:48,158] kopf._core.engines.a [INFO ] Re-authentication has been initiated.
[2024-12-10 15:43:48,167] kopf.activities.auth [INFO ] Activity 'login_via_client' succeeded.
[2024-12-10 15:43:48,167] kopf._core.engines.a [INFO ] Re-authentication has finished.
[2024-12-10 15:43:48,167] kopf.objects [ERROR ] [default/custom-resource] Throttling for 1 seconds due to an unexpected error: LoginError('Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/')
Traceback (most recent call last):
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 50, in wrapper
response = await fn(*args, **kwargs, context=context)
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 85, in request
await errors.check_response(response) # but do not parse it!
File "/Users/jamesmchugh/git/operators/test_operator_bug.py", line 23, in check_response
raise errors.APIUnauthorizedError(None, status=401)
kopf._cogs.clients.errors.APIUnauthorizedError: (None, None)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/actions/throttlers.py", line 44, in throttled
yield should_run
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/reactor/processing.py", line 130, in process_resource_event
applied = await application.apply(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/actions/application.py", line 60, in apply
await patch_and_check(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/actions/application.py", line 131, in patch_and_check
resulting_body = await patching.patch_obj(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/patching.py", line 47, in patch_obj
patched_body = await api.patch(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 155, in patch
response = await request(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 56, in wrapper
await vault.invalidate(key, exc=e)
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 297, in invalidate
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
[2024-12-10 15:43:49,168] kopf.objects [INFO ] [default/custom-resource] Throttling is over. Switching back to normal operations.
[2024-12-10 15:43:49,169] kopf.objects [ERROR ] [default/custom-resource] Throttling for 1 seconds due to an unexpected error: LoginError('Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/')
Traceback (most recent call last):
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/actions/throttlers.py", line 44, in throttled
yield should_run
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/reactor/processing.py", line 130, in process_resource_event
applied = await application.apply(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/actions/application.py", line 60, in apply
await patch_and_check(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/actions/application.py", line 131, in patch_and_check
resulting_body = await patching.patch_obj(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/patching.py", line 47, in patch_obj
patched_body = await api.patch(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 155, in patch
response = await request(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 48, in wrapper
async for key, info, context in vault.extended(APIContext, 'contexts'):
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 158, in extended
async for key, item in self._items():
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 195, in _items
yielded_key, yielded_item = self.select()
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 214, in select
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
[2024-12-10 15:43:50,165] kopf._core.reactor.q [WARNING ] Unprocessed streams left for [(custom-resource.v1beta1.foo.com, 'd782af8b-1cf4-42bc-abc3-c02ff635470f')].
[2024-12-10 15:43:50,166] kopf._core.reactor.o [ERROR ] Watcher for custom-resource.v1beta1.foo.com@default has failed: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
Traceback (most recent call last):
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 96, in guard
await coro
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
async for raw_event in stream:
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
async for raw_event in stream:
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
async for raw_input in stream:
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
async for raw_input in api.stream(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/api.py", line 200, in stream
response = await request(
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/clients/auth.py", line 48, in wrapper
async for key, info, context in vault.extended(APIContext, 'contexts'):
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 158, in extended
async for key, item in self._items():
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 195, in _items
yielded_key, yielded_item = self.select()
File "/Users/jamesmchugh/anaconda3/envs/python-3.10/lib/python3.10/site-packages/kopf/_cogs/structs/credentials.py", line 214, in select
raise LoginError("Ran out of valid credentials. Consider installing "
kopf._cogs.structs.credentials.LoginError: Ran out of valid credentials. Consider installing an API client library or adding a login handler. See more: https://kopf.readthedocs.io/en/stable/authentication/
# operator continues running, but doing nothing
Additional information
To reproduce this scenario, create a CRD and set the CR_* vars in the code above. Additionally, create a service account with roles that only have access to the resources in the namespace the operator is monitoring, such as below:
Create a token for that service account (kubectl create token operator-test) and add it as a new user to your kubeconfig. Change contexts so this new context with the new user is being actively used.
Run the operator with
kopf run -n default <filename>
After startup, issue the SIGUSR1 signal to the process trigger the monkeypatched auth method to raise an AuthenticationError error next time it runs.
pkill -SIGUSR1 -nf kopf
Create or update the custom resource. Observe that the operator logs an error but continues running. Future create/update events of the custom resource (or any other resource if multiple handlers are used) are not observed.
In an environment without restricted access to a single namespace, the resource-observer and namespace-observer tasks run which are core operator tasks. Therefore, an error such as an auth failure will cause those tasks to fail and the operator to die. This is not the case when observing a single namespace.
Additionally, for resources that are not custom, the event-poster core task uses the events API to report when handlers succeed/fail. This too will fail in the face of an auth issue, causing the event-poster task to die and the operator to then die.
In the case of observing a custom resource within a single namespace, neither of the above safety nets gets tiggered. This results in the operator silently dying. From reviewing the code, I think a fix to this could be for to update
For some additional context, the auth related error I mentioned is the one trigger I found to reproduce this issue. However, it may not be the only trigger
In theory, this can also be reproduced by dropping all of the signal handling and monkeypatching from the above code, and instead just deleting the service account (or removing its rolebinding/role) to trigger the bug.
Long story short
When the operator's service account has limited access to the Kubernetes cluster (such as an RBAC that only gives it access to the current namespace), the watchers may die (such as due to a temporary auth issue) and never recover. This results in the operator continuing to run, but not monitoring resources for changes anymore. This appears to only happen for operators that are handling custom resources.
Kopf version
1.37.2
Kubernetes version
1.29.5
Python version
3.10.14
Related Issues
Code
Logs
Additional information
To reproduce this scenario, create a CRD and set the
CR_*
vars in the code above. Additionally, create a service account with roles that only have access to the resources in the namespace the operator is monitoring, such as below:Create a token for that service account (
kubectl create token operator-test
) and add it as a new user to your kubeconfig. Change contexts so this new context with the new user is being actively used.Run the operator with
After startup, issue the SIGUSR1 signal to the process trigger the monkeypatched auth method to raise an
AuthenticationError
error next time it runs.Create or update the custom resource. Observe that the operator logs an error but continues running. Future create/update events of the custom resource (or any other resource if multiple handlers are used) are not observed.
In an environment without restricted access to a single namespace, the
resource-observer
andnamespace-observer
tasks run which are core operator tasks. Therefore, an error such as an auth failure will cause those tasks to fail and the operator to die. This is not the case when observing a single namespace.Additionally, for resources that are not custom, the
event-poster
core task uses the events API to report when handlers succeed/fail. This too will fail in the face of an auth issue, causing theevent-poster
task to die and the operator to then die.In the case of observing a custom resource within a single namespace, neither of the above safety nets gets tiggered. This results in the operator silently dying. From reviewing the code, I think a fix to this could be for to update
kopf/kopf/_core/reactor/orchestration.py
Lines 104 to 132 in c158bae
orchestrator
checks monitors the status of the tasks in the ensemble, and raises an exception if they fail. An example is belowNow, watcher tasks whose exit status was not previously monitored are now monitored, and exceptions in them will cause the operator to exit.
I am not sure if there are other side-effects of this approach though
The text was updated successfully, but these errors were encountered: