Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EvalStatus on placement failure #24824

Open
devashishraj opened this issue Jan 9, 2025 · 1 comment
Open

EvalStatus on placement failure #24824

devashishraj opened this issue Jan 9, 2025 · 1 comment
Labels
theme/docs Documentation issues and enhancements theme/scheduling type/bug

Comments

@devashishraj
Copy link

devashishraj commented Jan 9, 2025

Nomad version

Output from nomad version
1.9.4

Operating system and Environment details

Clang: 16.0.0 build 1600
Git: 2.47.1 => /opt/homebrew/bin/git
Curl: 8.7.1 => /usr/bin/curl
macOS: 15.2-arm64
CLT: 16.2.0.0.1.1733547573
Xcode: 16.0
Rosetta 2: false

Issue

On a placement failure status returned is "complete" in evaluation structure.
There's no error the only thing one can use to detect this is 'trigger-by' field where i have will have to compare to string "job-deregister"

i understand since evaluation is complete(concluded job cannot be placed) so status is complete , but there i don't see a simple way to find out if job got placed

Reproduction steps

added a sample job file

Expected Result

expecting EvalStatus to be blocked or cancelled

Actual Result

Job file (if appropriate)

job "docs" {
  datacenters = ["hz1"]

  group "example" {
    network {
      port "http" {
        static = 5678
      }
    }

    task "server" {
      driver = "docker"
      constraint {
        attribute = "${meta.my_custom_value}"
        operator  = ">"
        value     = "3"
      }
      config {
        image = "hashicorp/http-echo"
        ports = ["http"]
        args = [
          "-listen",
          ":5678",
          "-text",
          "hello world"
        ]
      }
      
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@schmichael
Copy link
Member

This is really confusing! Thanks for filing an issue. I think this is probably best fixed with documentation (I can't find any!), but it's possible there's some CLI improvements we could make as well.

What's going on?

When you schedule a job it creates an evaluation as you have observed.

When that job cannot be placed (due to lack of cluster capacity), a blocked evaluation is created and the original evaluation is marked as complete.

The CLI makes this very confusing:

$ nomad job run jobs/unschedulable.nomad.hcl

==> 2025-01-16T16:14:24-08:00: Monitoring evaluation "a7756693"
    2025-01-16T16:14:24-08:00: Evaluation triggered by job "unschedulable"
    ...snipped...
    2025-01-16T16:14:24-08:00: Evaluation status changed: "pending" -> "complete"
==> 2025-01-16T16:14:24-08:00: Evaluation "a7756693" finished with status "complete" but failed to place all allocations:
    2025-01-16T16:14:24-08:00: Task Group "unschedulable" (failed to place 1 allocation):
      * No nodes were eligible for evaluation
    2025-01-16T16:14:24-08:00: Evaluation "1735df8a" waiting for additional capacity to place remainder
==> 2025-01-16T16:14:24-08:00: Monitoring deployment "09307da4"
  ⠏ Deployment "09307da4" in progress...

...

And the deployment spins until it hits its progress deadline or capacity becomes available.

So you did the natural thing and peeked at the evaluation to see what happened:

$ nomad eval status a7756693
ID                 = a7756693
...
Status             = complete
Placement Failures = true

Failed Placements
Task Group "unschedulable" (failed to place 1 allocation):
  * No nodes were eligible for evaluation

Evaluation "1735df8a" waiting for additional capacity to place remainder

That very last line is the missing piece of your puzzle! Let's look at 1735df8a:

$ nomad eval status 1735df8a
ID                 = 1735df8a
...
Status             = blocked
Status Description = created to place remaining allocations
...

That's the blocked evaluation that needs to be monitored!

So the evaluations form a chain that's a bit tricky to follow:

  • For the original eval (a7756693), the BlockedEval will be set to the next evaluation when a placement couldn't be made.
  • For the blocked eval (1735df8a) you can peek at the PreviousEval field to trace the chain of events backward.

The UI makes this much clearer by going to the Evaluations tab on the job:

Image

The Solution

I'm open to ideas!

I think the Scheduling > Placement doc is the most obvious place to explain blocked evaluations. I'm shocked it doesn't mention "blocked" or even "eval" once!

@schmichael schmichael added the theme/docs Documentation issues and enhancements label Jan 17, 2025
@schmichael schmichael moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/docs Documentation issues and enhancements theme/scheduling type/bug
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants