Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatcar machines stuck in Waiting... instead of pulling new release(s) #853

Open
tylerauerbeck opened this issue Oct 2, 2024 · 2 comments

Comments

@tylerauerbeck
Copy link

Description

After flipping pin to latest stable, a number of machines pulled down the latest download. We had paused downloads reboots over the weekend and came back to begin again a few days later and now we have a number of machines that are stuck in Current Status of Waiting.... When looking at update_engine logs the only mention I see is omaha_request_action.cc:629] HTTP reported success but Omaha reports an error.. When lining this up for a similar error in the Nebraska logs matching that machineID, the only log I see is update complete.error. Is there any additional logging I can turn up to determine the actual root of this problem. I've tried things like restarting update_engine, Nebraska, etc. to see if I can get things unstuck without any luck.

Impact

Further downloads are not occurring and current_status is not accurately reflecting the status of this rollout.

Environment and steps to reproduce

  1. Set-up: Nebraska 2.9 attempting to roll out 3975.2.1
  2. Task: Flipped channel pin to 3975.2.1 to begin rollout
  3. Action(s): Update pin to begin rollout, paused update of machines over a span of 2+ days (resulting in machines staying in the Downloaded state for a period of time prior to attempting to continue rollout
    a. [ requested the start of a new pod or container ]
    b. [ container image downloaded ]
  4. Error: [describe the error that was triggered]
  • omaha_request_action.cc:629] HTTP reported success but Omaha reports an error.
  • update complete.error

Expected behavior

Nebraska accurately reflecting current status and additional nodes continuing to download new release

Additional information

N/A

@tylerauerbeck
Copy link
Author

If it helps at all, the nodes in Waiting... seem to bunch up under On Hold in the dashboard for the channel.

@ErvinRacz
Copy link
Contributor

ErvinRacz commented Jan 16, 2025

Thank you for taking the time to report the issue, @tylerauerbeck!

I noticed it was reported a couple of months ago, and I wanted to check in to see if it's still relevant.

I just started to learn how the update server works a few weeks ago, but once had a similar experience when I was testing the update policy settings and reboot strategies - the status of nodes got stuck in the same Waiting ... status until I realized that I turned off the automatic reboot strategy.

Following questions may help us to investigate what happened:

  • Was there any nodes that have successfully got updated before the "pause"?
  • What do you mean by pause? Turning off reboots or disabling updates from Nebraska UI?
  • The reboot strategy can be checked in the following files:
/usr/share/flatcar/update.conf
/etc/flatcar/update.conf <--- overwrite the previous file's settings
  • update_engine_client -status is helpful the check the status of the update process on an individual node
  • What was the update policy set in Nebraska for the group?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants