-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad client not reporting pending job during GC #24777
Comments
Hi @EtienneBruines, I'm struggling to reproduce this. As noted in another issue of yours, client GC should asynchronously remove objects and not interfere with client's ability to place workloads. I'm starting to wonder that perhaps GC is a red herring here and the issue is elsewhere entirely? Could you give more details about your cluster setup and your workloads? |
Hi @pkazmierczak! WorkloadsWe have a mix of three workloads:
On average, no more than 1 job every two minutes is to be scheduled. Cluster setupWe are running three servers in a single region and clients in that same region. Network latency from clients to servers is between 1 and 2 ms. Our clients have This setting does not seem to work quite as well though:
Every 10 seconds the number of allocations drops by 2 (the default value and not 'our' value).
The GC is definitely an issue. Probably closely related to #2463 The issue might be elsewhere; the GC is currently the only thing that 'shows'. As long as that number of allocations is lowering but still above 50, the client will not process any jobs assigned to it - nor will it report those in the metrics. @pkazmierczak Perhaps some useful background info: #19917 |
Hi @EtienneBruines Im unable to reproduce this issue with the current information, but it seems like there is a problem with GCing, as it is also mentioned in #24778 and #19917. Can you please share as much information as possible to help us see what you are seeing? Servers and Clients configuration as well as job specs? Thank you! |
Nomad version
Nomad v1.9.4
BuildDate 2024-12-18T15:16:22Z
Revision 5e49fcd+CHANGES
Operating system and Environment details
Ubuntu 22.04.5 LTS on amd64
Issue
Situation:
number of allocations (68) is over the limit (50)
)Problems:
nomad.client.allocations.pending
as reported by the client is set to0
, despite having active pending allocations (for over 20 minutes already)Reproduction steps
pending
nomad.client.allocations.pending
metric being0
Expected Result
Ideally (in this order):
pending
for too long Reschedule long-pending allocs #24780But in any case:
nomad.client.allocations.pending
to report on the pending jobsActual Result
The
nomad.client.allocations.pending
metric is0
Job file (if appropriate)
Not applicable.
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
Only logs this:
After the GC-ing is complete (perhaps 20 minutes or so later), it starts the alloc and logs things like:
The text was updated successfully, but these errors were encountered: