Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slave containers failing with Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? #294

Open
Naveenraaj opened this issue Apr 20, 2021 · 0 comments

Comments

@Naveenraaj
Copy link

Naveenraaj commented Apr 20, 2021

Issue: Slave containers failing intermittently with "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"

Environment: Docker swarm
Docker Engine: Enterprise
Docker version: 19.03.11
API version: 1.40 (minimum version 1.12)

I have confirmed that Docker daemon in worker nodes are running healthy in the swarm cluster. It is even running other workloads and runs successful Jenkins builds partially and fails randomly. So far 5 out of 10 builds are failing with this issue.

It appears to be intermittent and might be reproducible only in my environment that currently has issue.

Jenkins logs:


Apr 20, 2021 5:40:56 PM INFO com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 20, 2021 5:40:56 PM INFO com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 20, 2021 5:41:15 PM INFO com.github.kostyasha.yad.DockerSlave _terminate
Requesting disconnect for computer: '<conatiner-id>'
Apr 20, 2021 5:41:15 PM INFO jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
Computer.threadPoolForRemoting [#7901] for <conatiner-id> terminated: java.nio.channels.ClosedChannelException
Apr 20, 2021 5:41:18 PM INFO org.jenkinsci.plugins.workflow.job.WorkflowRun finish
pcm/sgi-ta/stout7 #11 completed: FAILURE
Apr 20, 2021 5:41:19 PM INFO com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 20, 2021 5:41:23 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run
Connection #2,295 failed: java.io.EOFException
Apr 20, 2021 5:41:23 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run
Accepted JNLP4-connect connection #2,296 from xx.xxx.x.xxx/xx.xxx.x.xxx:yyyy
Apr 20, 2021 5:41:27 PM INFO jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
IOHub#1: Worker[channel:java.nio.channels.SocketChannel[connected local=/xx.xxx.x.xxx:yyyy remote=xx.xxx.x.xxx/xx.xxx.x.xxx:yyyy]] / Computer.threadPoolForRemoting [#7890] for <conatiner-id> terminated: java.nio.channels.ClosedChannelException
Apr 20, 2021 5:41:28 PM INFO com.github.kostyasha.yad.DockerSlave _terminate
Stopped container <conatiner-id>
Apr 20, 2021 5:41:30 PM INFO com.github.kostyasha.yad.DockerSlave _terminate
Removed container <conatiner-id>
Apr 20, 2021 5:41:30 PM WARNING org.jenkinsci.plugins.cloudstats.CloudStatistics$SlaveCompletionDetector onDeleted
Activity for deleted node <conatiner-id> already completed
java.lang.Exception
	at org.jenkinsci.plugins.cloudstats.CloudStatistics$SlaveCompletionDetector.onDeleted(CloudStatistics.java:596)
	at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:99)
	at jenkins.model.Nodes.removeNode(Nodes.java:279)
	at jenkins.model.Jenkins.removeNode(Jenkins.java:2127)
	at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:93)
	at com.github.kostyasha.yad.strategy.DockerOnceRetentionStrategy.lambda$done$0(DockerOnceRetentionStrategy.java:127)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Failed container logs:

/ # docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
/ # cd usr/bin/
/usr/bin # dockerd
WARN[2021-04-19T16:27:20.860527934Z] could not change group /var/run/docker.sock to docker: group docker not found
INFO[2021-04-19T16:27:20.861744587Z] libcontainerd: started new docker-containerd process  pid=571
INFO[0000] starting containerd                           module=containerd revision=773c489c9c1b21a6d78b5c538cd395416ec50f88 version=v1.0.3
INFO[0000] loading plugin "io.containerd.content.v1.content"...  module=containerd type=io.containerd.content.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.btrfs"...  module=containerd type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.btrfs  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module=containerd
INFO[0000] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  module=containerd type=io.containerd.snapshotter.v1
INFO[0000] loading plugin "io.containerd.metadata.v1.bolt"...  module=containerd type=io.containerd.metadata.v1
WARN[0000] could not use snapshotter btrfs in metadata plugin  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module="containerd/io.containerd.metadata.v1.bolt"
INFO[0000] loading plugin "io.containerd.differ.v1.walking"...  module=containerd type=io.containerd.differ.v1
INFO[0000] loading plugin "io.containerd.gc.v1.scheduler"...  module=containerd type=io.containerd.gc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.containers"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.content"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.diff"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.events"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.healthcheck"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.images"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.leases"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.namespaces"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.snapshots"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.monitor.v1.cgroups"...  module=containerd type=io.containerd.monitor.v1
INFO[0000] loading plugin "io.containerd.runtime.v1.linux"...  module=containerd type=io.containerd.runtime.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.tasks"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.version"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.introspection"...  module=containerd type=io.containerd.grpc.v1
INFO[0000] serving...                                    address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug"
INFO[0000] serving...                                    address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc"
INFO[0000] containerd successfully booted in 0.003044s   module=containerd
ERRO[2021-04-19T16:27:20.885550673Z] 'overlay2' is not supported over overlayfs
ERRO[2021-04-19T16:27:20.896274023Z] 'overlay' is not supported over overlayfs
ERRO[2021-04-19T16:27:20.896287457Z] Failed to built-in GetDriver graph devicemapper /var/lib/docker
INFO[2021-04-19T16:27:20.899619968Z] Graph migration to content-addressability took 0.00 seconds
INFO[2021-04-19T16:27:20.900577852Z] Loading containers: start.
WARN[2021-04-19T16:27:20.909560463Z] Running modprobe bridge br_netfilter failed with message: ip: can't find device nf_conntrack_ipv6,nf_nat_ipv6,ip_vs,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_nat_ipv4,nf_nat
modprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1
INFO[2021-04-19T16:27:21.000527007Z] Default bridge (docker0) is assigned with an IP address xxx.xx.x.x/xx. Daemon option --bip can be used to set a preferred IP address
INFO[2021-04-19T16:27:21.023839299Z] Loading containers: done.
INFO[2021-04-19T16:27:21.032122568Z] Docker daemon                                 commit=9ee9f40 graphdriver(s)=vfs version=18.03.1-ce
INFO[2021-04-19T16:27:21.032243356Z] Daemon has completed initialization
INFO[2021-04-19T16:27:21.037143001Z] API listen on /var/run/docker.sock

Any suggestions would be helpful, Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant