You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Slave containers failing intermittently with "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Environment: Docker swarm
Docker Engine: Enterprise
Docker version: 19.03.11
API version: 1.40 (minimum version 1.12)
I have confirmed that Docker daemon in worker nodes are running healthy in the swarm cluster. It is even running other workloads and runs successful Jenkins builds partially and fails randomly. So far 5 out of 10 builds are failing with this issue.
It appears to be intermittent and might be reproducible only in my environment that currently has issue.
Jenkins logs:
Apr 20, 2021 5:40:56 PM INFO com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 20, 2021 5:40:56 PM INFO com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 20, 2021 5:41:15 PM INFO com.github.kostyasha.yad.DockerSlave _terminate
Requesting disconnect for computer: '<conatiner-id>'
Apr 20, 2021 5:41:15 PM INFO jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
Computer.threadPoolForRemoting [#7901] for <conatiner-id> terminated: java.nio.channels.ClosedChannelException
Apr 20, 2021 5:41:18 PM INFO org.jenkinsci.plugins.workflow.job.WorkflowRun finish
pcm/sgi-ta/stout7 #11 completed: FAILURE
Apr 20, 2021 5:41:19 PM INFO com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 20, 2021 5:41:23 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run
Connection #2,295 failed: java.io.EOFException
Apr 20, 2021 5:41:23 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run
Accepted JNLP4-connect connection #2,296 from xx.xxx.x.xxx/xx.xxx.x.xxx:yyyy
Apr 20, 2021 5:41:27 PM INFO jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
IOHub#1: Worker[channel:java.nio.channels.SocketChannel[connected local=/xx.xxx.x.xxx:yyyy remote=xx.xxx.x.xxx/xx.xxx.x.xxx:yyyy]] / Computer.threadPoolForRemoting [#7890] for <conatiner-id> terminated: java.nio.channels.ClosedChannelException
Apr 20, 2021 5:41:28 PM INFO com.github.kostyasha.yad.DockerSlave _terminate
Stopped container <conatiner-id>
Apr 20, 2021 5:41:30 PM INFO com.github.kostyasha.yad.DockerSlave _terminate
Removed container <conatiner-id>
Apr 20, 2021 5:41:30 PM WARNING org.jenkinsci.plugins.cloudstats.CloudStatistics$SlaveCompletionDetector onDeleted
Activity for deleted node <conatiner-id> already completed
java.lang.Exception
at org.jenkinsci.plugins.cloudstats.CloudStatistics$SlaveCompletionDetector.onDeleted(CloudStatistics.java:596)
at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:99)
at jenkins.model.Nodes.removeNode(Nodes.java:279)
at jenkins.model.Jenkins.removeNode(Jenkins.java:2127)
at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:93)
at com.github.kostyasha.yad.strategy.DockerOnceRetentionStrategy.lambda$done$0(DockerOnceRetentionStrategy.java:127)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failed container logs:
/ # docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
/ # cd usr/bin/
/usr/bin # dockerd
WARN[2021-04-19T16:27:20.860527934Z] could not change group /var/run/docker.sock to docker: group docker not found
INFO[2021-04-19T16:27:20.861744587Z] libcontainerd: started new docker-containerd process pid=571
INFO[0000] starting containerd module=containerd revision=773c489c9c1b21a6d78b5c538cd395416ec50f88 version=v1.0.3
INFO[0000] loading plugin "io.containerd.content.v1.content"... module=containerd type=io.containerd.content.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.btrfs"... module=containerd type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module=containerd
INFO[0000] loading plugin "io.containerd.snapshotter.v1.overlayfs"... module=containerd type=io.containerd.snapshotter.v1
INFO[0000] loading plugin "io.containerd.metadata.v1.bolt"... module=containerd type=io.containerd.metadata.v1
WARN[0000] could not use snapshotter btrfs in metadata plugin error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module="containerd/io.containerd.metadata.v1.bolt"
INFO[0000] loading plugin "io.containerd.differ.v1.walking"... module=containerd type=io.containerd.differ.v1
INFO[0000] loading plugin "io.containerd.gc.v1.scheduler"... module=containerd type=io.containerd.gc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.containers"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.content"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.diff"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.events"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.healthcheck"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.images"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.leases"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.namespaces"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.snapshots"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.monitor.v1.cgroups"... module=containerd type=io.containerd.monitor.v1
INFO[0000] loading plugin "io.containerd.runtime.v1.linux"... module=containerd type=io.containerd.runtime.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.tasks"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.version"... module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.introspection"... module=containerd type=io.containerd.grpc.v1
INFO[0000] serving... address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug"
INFO[0000] serving... address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc"
INFO[0000] containerd successfully booted in 0.003044s module=containerd
ERRO[2021-04-19T16:27:20.885550673Z] 'overlay2' is not supported over overlayfs
ERRO[2021-04-19T16:27:20.896274023Z] 'overlay' is not supported over overlayfs
ERRO[2021-04-19T16:27:20.896287457Z] Failed to built-in GetDriver graph devicemapper /var/lib/docker
INFO[2021-04-19T16:27:20.899619968Z] Graph migration to content-addressability took 0.00 seconds
INFO[2021-04-19T16:27:20.900577852Z] Loading containers: start.
WARN[2021-04-19T16:27:20.909560463Z] Running modprobe bridge br_netfilter failed with message: ip: can't find device nf_conntrack_ipv6,nf_nat_ipv6,ip_vs,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_nat_ipv4,nf_nat
modprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1
INFO[2021-04-19T16:27:21.000527007Z] Default bridge (docker0) is assigned with an IP address xxx.xx.x.x/xx. Daemon option --bip can be used to set a preferred IP address
INFO[2021-04-19T16:27:21.023839299Z] Loading containers: done.
INFO[2021-04-19T16:27:21.032122568Z] Docker daemon commit=9ee9f40 graphdriver(s)=vfs version=18.03.1-ce
INFO[2021-04-19T16:27:21.032243356Z] Daemon has completed initialization
INFO[2021-04-19T16:27:21.037143001Z] API listen on /var/run/docker.sock
Any suggestions would be helpful, Thanks in advance.
The text was updated successfully, but these errors were encountered:
Issue: Slave containers failing intermittently with "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Environment: Docker swarm
Docker Engine: Enterprise
Docker version: 19.03.11
API version: 1.40 (minimum version 1.12)
I have confirmed that Docker daemon in worker nodes are running healthy in the swarm cluster. It is even running other workloads and runs successful Jenkins builds partially and fails randomly. So far 5 out of 10 builds are failing with this issue.
It appears to be intermittent and might be reproducible only in my environment that currently has issue.
Jenkins logs:
Failed container logs:
Any suggestions would be helpful, Thanks in advance.
The text was updated successfully, but these errors were encountered: