Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get velero version from velero client when the network is slower than 250 milliseconds #8620

Open
glowing-axolotl opened this issue Jan 15, 2025 · 0 comments

Comments

@glowing-axolotl
Copy link

glowing-axolotl commented Jan 15, 2025

What steps did you take and what happened:
Whilst running velero version, no output is returned for the server version:

# velero version
Client:
        Version: v1.14.1
        Git commit: 8afe3cea8b7058f7baaf447b9fb407312c40d2da
Server:
        Version:
# WARNING: the client version does not match the server version. Please update server

Initially suspecting some sort of corruption, I started digging, since the solution for #3287 didn't work (reapplying the CRDs).

A ServerStatusRequest is correctly getting created and does have the correct YAML/has no errors whatsoever:

# oc get serverstatusrequests.velero.io -A
NAMESPACE   NAME               AGE
velero      velero-cli-d4jtr   2m38s
velero      velero-cli-tqknd   57s

Velero is correctly running without problems:

# oc -n velero get pods
NAME                     READY   STATUS    RESTARTS   AGE
node-agent-46qlg         1/1     Running   0          13d
node-agent-5pnqp         1/1     Running   0          13d
node-agent-868hk         1/1     Running   0          13d
node-agent-8lkjf         1/1     Running   0          13d
velero-86664c55d-q9dvk   1/1     Running   0          44m

From the velero logs, we can see that the ServerStatusRequest is received, but nothing else happens (aside from the normal BackupStorageLocations logs etc...):

oc -n velero logs velero-86664c55d-q9dvk -f
...
time="2025-01-15T13:24:17Z" level=info msg="Processing new ServerStatusRequest" controller=server-status-request logSource="pkg/controller/server_status_request_controller.go:105" phase= serverStatusRequest=velero/velero-cli-wblkm
...
<nothing aside standard logs>

Inspecting the source code for the "version" CLI command:
https://github.com/vmware-tanzu/velero/blob/main/pkg/cmd/cli/version/version.go
https://github.com/vmware-tanzu/velero/blob/main/pkg/cmd/cli/serverstatus/server_status.go#L70

I found the following line, which limits the waiting time for receiving a response to 250 milliseconds:

wait.Until(checkFunc, 250*time.Millisecond, ctx.Done())

I did notice that when running kubectl/velero commands, the CLI would hang for about half a second, but the rest of the cluster was working correctly.

Trying to GET the apiserver also shows slowness:

# time curl -k 'https://api.mycluster.mydomain.internal:6443/healthz'
ok
real    0m2.292s
user    0m0.021s
sys     0m0.012s

From this I was able to test that the DNS server was slow to respond, it takes about a second for the reply:

# time dig +short api.mycluster.mydomain.internal
192.168.1.12
real    0m1.022s
user    0m0.011s
sys     0m0.009s

Just to make sure, I tried adding the DNS record directly to /etc/hosts:

vim /etc/hosts
# Test velero
192.168.1.12 api.mycluster.mydomain.internal

This finally solved the problem:

# velero version
Client:
        Version: v1.14.1
        Git commit: 8afe3cea8b7058f7baaf447b9fb407312c40d2da
Server:
        Version: v1.14.0

What did you expect to happen:

I'm writing this mainly to help other people with the same problem find this through a search engine, hopefully saving them time.

A possible solution would be to display a Timeout error of sorts, but I understand that the problem in this case is on the DNS side (It's always DNS!).

Would the developers be interested in implementing a timeout error for the problem above? Otherwise, I can simply close the issue.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
No need since the problem was already identified

Anything else you would like to add:
Nothing in particular

Environment:

  • Velero version (use velero version): v1.14.0
  • Velero features (use velero client config get features): features: <NOT SET>
  • Kubernetes version (use kubectl version):
    Running on OKD:
    Client Version: 4.15.0-0.okd-2024-03-10-010116
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: 4.15.0-0.okd-2024-03-10-010116
    Kubernetes Version: v1.28.2-3598+6e2789bbd58938-dirty
  • Kubernetes installer & version:
    OKD 4.15, see above
  • Cloud provider or hardware configuration: OpenShift on VMWare
  • OS (e.g. from /etc/os-release): "Fedora CoreOS 39.20240210.3.0

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant