EvalStatus on placement failure #24824

devashishraj · 2025-01-09T15:47:35Z

Nomad version

Output from nomad version
1.9.4

Operating system and Environment details

Clang: 16.0.0 build 1600
Git: 2.47.1 => /opt/homebrew/bin/git
Curl: 8.7.1 => /usr/bin/curl
macOS: 15.2-arm64
CLT: 16.2.0.0.1.1733547573
Xcode: 16.0
Rosetta 2: false

Issue

On a placement failure status returned is "complete" in evaluation structure.
There's no error the only thing one can use to detect this is 'trigger-by' field where i have will have to compare to string "job-deregister"

i understand since evaluation is complete(concluded job cannot be placed) so status is complete , but there i don't see a simple way to find out if job got placed

Reproduction steps

added a sample job file

Expected Result

expecting EvalStatus to be blocked or cancelled

Actual Result

Job file (if appropriate)

job "docs" {
  datacenters = ["hz1"]

  group "example" {
    network {
      port "http" {
        static = 5678
      }
    }

    task "server" {
      driver = "docker"
      constraint {
        attribute = "${meta.my_custom_value}"
        operator  = ">"
        value     = "3"
      }
      config {
        image = "hashicorp/http-echo"
        ports = ["http"]
        args = [
          "-listen",
          ":5678",
          "-text",
          "hello world"
        ]
      }
      
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

The text was updated successfully, but these errors were encountered:

schmichael · 2025-01-17T00:28:30Z

This is really confusing! Thanks for filing an issue. I think this is probably best fixed with documentation (I can't find any!), but it's possible there's some CLI improvements we could make as well.

What's going on?

When you schedule a job it creates an evaluation as you have observed.

When that job cannot be placed (due to lack of cluster capacity), a blocked evaluation is created and the original evaluation is marked as complete.

The CLI makes this very confusing:

$ nomad job run jobs/unschedulable.nomad.hcl

==> 2025-01-16T16:14:24-08:00: Monitoring evaluation "a7756693"
    2025-01-16T16:14:24-08:00: Evaluation triggered by job "unschedulable"
    ...snipped...
    2025-01-16T16:14:24-08:00: Evaluation status changed: "pending" -> "complete"
==> 2025-01-16T16:14:24-08:00: Evaluation "a7756693" finished with status "complete" but failed to place all allocations:
    2025-01-16T16:14:24-08:00: Task Group "unschedulable" (failed to place 1 allocation):
      * No nodes were eligible for evaluation
    2025-01-16T16:14:24-08:00: Evaluation "1735df8a" waiting for additional capacity to place remainder
==> 2025-01-16T16:14:24-08:00: Monitoring deployment "09307da4"
  ⠏ Deployment "09307da4" in progress...

...

And the deployment spins until it hits its progress deadline or capacity becomes available.

So you did the natural thing and peeked at the evaluation to see what happened:

$ nomad eval status a7756693
ID                 = a7756693
...
Status             = complete
Placement Failures = true

Failed Placements
Task Group "unschedulable" (failed to place 1 allocation):
  * No nodes were eligible for evaluation

Evaluation "1735df8a" waiting for additional capacity to place remainder

That very last line is the missing piece of your puzzle! Let's look at 1735df8a:

$ nomad eval status 1735df8a
ID                 = 1735df8a
...
Status             = blocked
Status Description = created to place remaining allocations
...

That's the blocked evaluation that needs to be monitored!

So the evaluations form a chain that's a bit tricky to follow:

For the original eval (a7756693), the BlockedEval will be set to the next evaluation when a placement couldn't be made.
For the blocked eval (1735df8a) you can peek at the PreviousEval field to trace the chain of events backward.

The UI makes this much clearer by going to the Evaluations tab on the job:

The Solution

I'm open to ideas!

I think the Scheduling > Placement doc is the most obvious place to explain blocked evaluations. I'm shocked it doesn't mention "blocked" or even "eval" once!

devashishraj added the type/bug label Jan 9, 2025

jrasell added this to Nomad - Community Issues Triage Jan 14, 2025

github-project-automation bot moved this to Needs Triage in Nomad - Community Issues Triage Jan 14, 2025

schmichael added the theme/docs Documentation issues and enhancements label Jan 17, 2025

schmichael moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jan 17, 2025

schmichael added the theme/scheduling label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EvalStatus on placement failure #24824

EvalStatus on placement failure #24824

devashishraj commented Jan 9, 2025 •

edited

Loading

schmichael commented Jan 17, 2025

EvalStatus on placement failure #24824

EvalStatus on placement failure #24824

Comments

devashishraj commented Jan 9, 2025 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

schmichael commented Jan 17, 2025

What's going on?

The Solution

devashishraj commented Jan 9, 2025 •

edited

Loading