-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker service scaling recovery problems #291
Comments
Is there any error logs surrounding these incidents such as from the dispatcher or scaler containers? |
The dispatcher doesn't seem to have any errors, however the scaler did throw quite a few, here is an example. It seems that it could possibly correlate to the issue but it's hard to exactly pinpoint. Either way it doesn't seem to be operating normally. {
"@timestamp": "2024-12-08 23:38:38,879",
"event": {
"module": "assemblyline",
"dataset": "assemblyline.scaler"
},
"host": {
"ip": "x.x.x.x",
"hostname": "9b0f4e6f0c71"
},
"log": {
"level": "ERROR",
"logger": "assemblyline.scaler"
},
"process": {
"pid": "1"
},
"message": "Crash in scaler: update_scaling\nTraceback (most recent call last):\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 789, in urlopen\n response = self._make_request(\n ^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 495, in _make_request\n conn.request(\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connection.py\", line 441, in request\n self.endheaders()\n File \"/usr/local/lib/python3.11/http/client.py\", line 1298, in endheaders\n self._send_output(message_body, encode_chunked=encode_chunked)\n File \"/usr/local/lib/python3.11/http/client.py\", line 1058, in _send_output\n self.send(msg)\n File \"/usr/local/lib/python3.11/http/client.py\", line 996, in send\n self.connect()\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/transport/unixconn.py\", line 26, in connect\n sock.connect(self.unix_socket)\nBlockingIOError: [Errno 11] Resource temporarily unavailable\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/requests/adapters.py\", line 667, in send\n resp = conn.urlopen(\n ^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/elasticapm/instrumentation/packages/base.py\", line 213, in call_if_sampling\n return self.call(module, method, wrapped, instance, args, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/elasticapm/instrumentation/packages/urllib3.py\", line 132, in call\n response = wrapped(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 843, in urlopen\n retries = retries.increment(\n ^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/util/retry.py\", line 474, in increment\n raise reraise(type(error), error, _stacktrace)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/util/util.py\", line 38, in reraise\n raise value.with_traceback(tb)\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 789, in urlopen\n response = self._make_request(\n ^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 495, in _make_request\n conn.request(\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/urllib3/connection.py\", line 441, in request\n self.endheaders()\n File \"/usr/local/lib/python3.11/http/client.py\", line 1298, in endheaders\n self._send_output(message_body, encode_chunked=encode_chunked)\n File \"/usr/local/lib/python3.11/http/client.py\", line 1058, in _send_output\n self.send(msg)\n File \"/usr/local/lib/python3.11/http/client.py\", line 996, in send\n self.connect()\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/transport/unixconn.py\", line 26, in connect\n sock.connect(self.unix_socket)\nurllib3.exceptions.ProtocolError: ('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable'))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/scaler_server.py\", line 414, in with_logs\n fn(*args, **kwargs)\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/scaler_server.py\", line 742, in update_scaling\n raw_targets = self.controller.get_targets()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/controllers/docker_ctl.py\", line 360, in get_targets\n return {name: self.get_target(name) for name in names}\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/controllers/docker_ctl.py\", line 360, in <dictcomp>\n return {name: self.get_target(name) for name in names}\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/controllers/docker_ctl.py\", line 347, in get_target\n for container in self.client.containers.list(filters=filters, ignore_removed=True):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/models/containers.py\", line 1018, in list\n containers.append(self.get(r['Id']))\n ^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/models/containers.py\", line 954, in get\n resp = self.client.api.inspect_container(container_id)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/utils/decorators.py\", line 19, in wrapped\n return f(self, resource_id, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/api/container.py\", line 794, in inspect_container\n self._get(self._url(\"/containers/{0}/json\", container)), True\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/utils/decorators.py\", line 44, in inner\n return f(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/docker/api/client.py\", line 246, in _get\n return self.get(url, **self._set_request_timeout(kwargs))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/requests/sessions.py\", line 602, in get\n return self.request(\"GET\", url, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/requests/sessions.py\", line 589, in request\n resp = self.send(prep, **send_kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/elasticapm/instrumentation/packages/base.py\", line 213, in call_if_sampling\n return self.call(module, method, wrapped, instance, args, kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/elasticapm/instrumentation/packages/requests.py\", line 58, in call\n response = wrapped(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/requests/sessions.py\", line 703, in send\n r = adapter.send(request, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/requests/adapters.py\", line 682, in send\n raise ConnectionError(err, request=request)\nrequests.exceptions.ConnectionError: ('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable'))\n"
} |
Seems to be an issue when there's high IO going on and a socket pertaining to Docker is "unavailable". Since this does seem to happen during Currently the only way to do this is my mounting over that file in the scaler container, but if increasing that value does work, we could make it more configurable. |
I can certainly try this, thank you, I'll report back when I get it working. |
Describe the bug
In some cases services will not scale properly after services fail for any number of reasons, OOM or other. The result is that the overall throughput of the deployment will drop to nothing because a service with samples in queue to be processed will have 0 instances running. What seems to help this issue is disabling and enabling the service in the Administrator panel. I have seen this plague every type of service that ships with AssemblyLine, recently I have had to intervene with aforementioned remedy on: CAPA (4.5.0.stable9), DeobfuScripter (4.5.0.stable14), Batchdeobfuscator (4.5.0.stable19), and Espresso (4.5.0.stable7).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
After service failure they would recover within a reasonable time
Screenshots
N/A
Environment (please complete the following information if pertinent):
Assemblyline Docker deployment 0.4.5 stable, last updated 2 weeks ago
Additional context
I have created a service that detects this condition using the client socketio log listener and disables then enables the afflicted service, I have been running it for about 4 days now and I see great throughput improvements. However I wanted to pass this along to possibly find some root cause for it.
The text was updated successfully, but these errors were encountered: