Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove addresses from node class hash #24942

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tgross
Copy link
Member

@tgross tgross commented Jan 24, 2025

When a node is fingerprinted, we calculate a "computed class" from a hash over a subset of its fields and attributes. In the scheduler, when a given node fails feasibility checking (before fit checking) we know that no other node of that same class will be feasible, and we add the hash to a map so we can reject them early. This hash cannot include any values that are unique to a given node, otherwise no other node will have the same hash and we'll never save ourselves the work of feasibility checking those nodes.

In #4390 we introduce the nomad.advertise.address attribute and in #19969 we introduced consul.dns.addr attribute. Both of these are unique per node and break the hash.

Additionally, we were not correctly filtering attributes out when checking if a node escaped the class by not filtering for attributes that start with unique.. The test for this introduced in #708 had an inverted assertion, which allowed this to pass unnoticed since the early days of Nomad.

Ref: #708
Ref: #4390
Ref: #19969

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

Comment on lines +126 to +127
case strings.HasPrefix(target, "${unique."):
return true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure there actually are values that fall into this case (the unique attributes from fingerprinting would be attr.unique), but it was one of the test cases.

@tgross
Copy link
Member Author

tgross commented Jan 24, 2025

FWIW it's not entirely clear to me that the computed class hash is a useful optimization as it's currently designed -- most clusters aren't going to have uniform-enough machines around items like CPU models. Maybe we'd get more mileage out of this feature if it were based on an allowlist of attributes instead of "all attributes except ones specifically marked unique". But that's more of a design improvement than this bugfix.

@tgross tgross added backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line labels Jan 24, 2025
@tgross tgross force-pushed the b-node-class-escaped-constraints branch from 7c69a85 to f798367 Compare January 24, 2025 18:18
When a node is fingerprinted, we calculate a "computed class" from a hash over a
subset of its fields and attributes. In the scheduler, when a given node fails
feasibility checking (before fit checking) we know that no other node of that
same class will be feasible, and we add the hash to a map so we can reject them
early. This hash cannot include any values that are unique to a given node,
otherwise no other node will have the same hash and we'll never save ourselves
the work of feasibility checking those nodes.

In #4390 we introduce the `nomad.advertise.address` attribute and in #19969 we
introduced `consul.dns.addr` attribute. Both of these are unique per node and
break the hash.

Additionally, we were not correctly filtering attributes out when checking if a
node escaped the class by not filtering for attributes that start with
`unique.`. The test for this introduced in #708 had an inverted assertion, which
allowed this to pass unnoticed since the early days of Nomad.

Ref: #708
Ref: #4390
Ref: #19969
@tgross tgross force-pushed the b-node-class-escaped-constraints branch from f798367 to 68cbaee Compare January 24, 2025 18:19
@tgross tgross marked this pull request as ready for review January 24, 2025 19:03
@tgross tgross requested review from a team as code owners January 24, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line theme/scheduling type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant