Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --nodown Option to Exclude Nodes That Are Down #449

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hawartens
Copy link
Contributor

@hawartens hawartens commented Sep 16, 2020

Add in an option that is similar to the -v option in pdsh. The idea
is to skip all hosts that are currently down for one reason or another.
The groups.conf file would have something like this in it:

[genders]
map: nodeattr -n $GROUP
all: nodeattr -n -A
list: nodeattr -l
down: whatsup -n -d || /bin/true

Example usage without -D:

> clush -a -b /path/to/command.sh
host8: mcmd: connect failed: No route to host
host12: mcmd: connect failed: No route to host
clush: host[8,12] (2): exited with exit code 1
---------------
host[1-7,9-11] (10)
---------------
Hello World

Example usage with -D:
> clush -D -a -b /path/to/command.sh
---------------
host[1-7,9-11] (10)
---------------
Hello World

Add in an option that is similar to the -v option in pdsh. The idea
is to skip all hosts that are currently down for one reason or another.
The groups.conf file would have something like this in it:

[genders]
map: nodeattr -n $GROUP
all: nodeattr -n -A
list: nodeattr -l
down: whatsup -n -d || /bin/true

Example usage without -D:
> clush -a -b /path/to/command.sh
host8: mcmd: connect failed: No route to host
host12: mcmd: connect failed: No route to host
clush: host[8,12] (2): exited with exit code 1
---------------
host[1-7,9-11] (10)
---------------
Hello World

Example usage with -D:
> clush -D -a -b /path/to/command.sh
---------------
host[1-7,9-11] (10)
---------------
Hello World
@hawartens
Copy link
Contributor Author

Looks like I broke a test. Will fix. Thanks.

Use named parameters instead of relying on position when initializing
UpcallGroupSource class.
@hawartens
Copy link
Contributor Author

Hello. Would like to start a conversation about this pull request. Would like to give clush the ability as described here to skip nodes that are "down." On large clusters, there may be several nodes that are down regularly and would prefer not to send commands to them at all. This pull request is a start at that. Would love your feedback! Thanks!

@degremont
Copy link
Collaborator

Thanks for this PR. Few questions to understand this better

  • is whatsup an independent tool or is it related to genders? Any pointer?
  • how would you apply that to Slurm group source by example?
  • Do you have use cases where you want to run your command on all hosts (even those which you know are down)?

My experience running supercomputer indicates that the source giving the group of nodes is often a totally different one to the one giving the status of nodes. ie:

roles:
    compute: node[1-3]
    login: node[4-5]
[states]
   map: nodehealth -f $GROUP

where you usually combine them with: clush -w @roles:compute&@states:up or clush -w @roles:compute&@states:down. That's why I'm wondering if having such default rules, per source is the right one.

There are only two hard things in Computer Science: cache invalidation and naming things.

I'm looking toward naming the option --up. I prefer positive flags instead of double negation ("no" "down")

Copy link
Collaborator

@martinetd martinetd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a whole I agree with @degremont -- this would be nice to be able to use a group from another source instead of hardcoding a down mapping in each source.

Maybe add a config option that automatically 'ands' with a group specified in config unless explicitely disabled? That is more generic and would suit your usage while avoiding having to redefine the down attribute everywhere.
I agree the '&states:up' idiom is a bit cumbersome so having some shortcut would definitely be nice though.

@@ -224,12 +227,12 @@ def _upcall_cache(self, upcall, cache, key, **args):
raise GroupSourceNoUpcall(upcall, self)

# Purge expired data from cache
if key in cache and cache[key][1] < time.time():
if key in cache and cache[key]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look right, you're purging the whole cache instead of a time-based expiration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Needs to be fixed. Was seeing a problem where cache[key][1] was not defined and was blowing up. Did this temporarily and forgot to go back and fix. Thanks!

self.logger.debug("PURGE EXPIRED (%d)'%s'", cache[key][1], key)
del cache[key]

# Fetch the data if unknown of just purged
if key not in cache:
if key not in cache or not cache[key]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one could make sense but could use its own commit at least, possibly its own PR.
You're basically saying not to cache empty groups and I'm not sure I agree e.g. if your "down" set is currently empty then it won't be cached and getting the list of down nodes is possibly expensive.

@hawartens
Copy link
Contributor Author

hawartens commented Sep 22, 2020

Thanks for this PR. Few questions to understand this better

  • is whatsup an independent tool or is it related to genders? Any pointer?

whatsup is a tool that calculates what hosts are up/down in a cluster
https://github.com/chaos/whatsup

  • how would you apply that to Slurm group source by example?

Not sure there is a need. The "down" is optional. I am not a slurm expert, but you might do it like:
down: sinfo -h --dead -o "%N"
The note thing is that each site can define "down" for themselves.

  • Do you have use cases where you want to run your command on all hosts (even those which you know are down)?

I think there may be times where we want to run the command everywhere and visibly see that we got an error connecting to that host. Can't think of a great reason at the moment, but probably just tired.

My experience running supercomputer indicates that the source giving the group of nodes is often a totally different one to the one giving the status of nodes. ie:

roles:
    compute: node[1-3]
    login: node[4-5]
[states]
   map: nodehealth -f $GROUP

where you usually combine them with: clush -w @roles:compute&@states:up or clush -w @roles:compute&@states:down. That's why I'm wondering if having such default rules, per source is the right one.

There are only two hard things in Computer Science: cache invalidation and naming things.

I'm looking toward naming the option --up. I prefer positive flags instead of double negation ("no" "down")

Totally fine with changing the name/polarity of the option if you prefer. I was just mapping this to the -v option in pdsh (and was trying to give it a similar feel). From pdsh manpage:

nodeupdown module options
       -v     Eliminate target nodes that are considered "down" by libnodeupdown.

@degremont
Copy link
Collaborator

I feel like we should go toward something like

  • Having a proper source for up/down (or other states)
[whatsup]
list: echo -e "up\ndown"
map: if [ $GROUP == "up" ]; then whatsup --up; elif [ $GROUP == "down" ]; then whatsup --down; fi; true

Then find a proper way to declare the upcall, maybe in clush.conf. Easier to do, but not available to library? Like:

downnodes: @whatsup:down

And add to clush

--up:   excludes known "down" nodes using `downnodes` callback.

I was just mapping this to the -v option in pdsh (and was trying to give it a similar feel). From pdsh manpage:

In my thinking, running on upnodes was the same than skipping down nodes. So the same feel than what you're doing with pdsh, just using --up instead of -v. We can think of a short option though...

To be discussed...

@thiell
Copy link
Collaborator

thiell commented Sep 22, 2020

Thanks @hawartens for the info! And I like @degremont's idea very much. It would only require modification of clush, and we don't need to add a new group upcall. What do you think @hawartens, would that fit your needs?

@martinetd
Copy link
Collaborator

A couple of points:

  • I'm not fan of limiting this to clush. Sometimes you need nodeset/cluset to "delegate" the work (think milkcheck or shine; but I'm sure other sites have shell scripts that do similar works?) ; so this should be available to other tools and having this in nodeset makes more sense
  • I think it's actually possible already with the source infrastructure we have, e.g. it isn't hard to make a "filter" source that just intersects or substract a noteset from whatever was given so e.g. @up:foo would take @foo and filter only up nodes (or @up:source:group if you use sources, nodeset cuts on the first colon so it would work appropriately I tested that quickly)

With that in mind I don't think adding a new option to clush would be worth it, I'm not fan of having too many options. A source has the drawback of being heavier syntax if you're going to list many nodeset or nodes but there might be some syntax trick to make @up:node[0-1000] or @up:group1,group2 to work, like escaping the comma? would need to play a bit more with that but it doesn't look impossible to me.

We definitely should ship an example of such a source if you come up with one, though, so others can also benefit more readily.

@hawartens
Copy link
Contributor Author

Thanks @hawartens for the info! And I like @degremont's idea very much. It would only require modification of clush, and we don't need to add a new group upcall. What do you think @hawartens, would that fit your needs?

@thiell I am good with any idea we have here that ends up making it very simple to avoid nodes that are down as the pdsh -v option does. @degremont's idea sounds good to me and seems like it would fit my needs.

@hawartens
Copy link
Contributor Author

hawartens commented Sep 23, 2020

A couple of points:

  • I'm not fan of limiting this to clush. Sometimes you need nodeset/cluset to "delegate" the work (think milkcheck or shine; but I'm sure other sites have shell scripts that do similar works?) ; so this should be available to other tools and having this in nodeset makes more sense
  • I think it's actually possible already with the source infrastructure we have, e.g. it isn't hard to make a "filter" source that just intersects or substract a noteset from whatever was given so e.g. @up:foo would take @foo and filter only up nodes (or @up:source:group if you use sources, nodeset cuts on the first colon so it would work appropriately I tested that quickly)

With that in mind I don't think adding a new option to clush would be worth it, I'm not fan of having too many options. A source has the drawback of being heavier syntax if you're going to list many nodeset or nodes but there might be some syntax trick to make @up:node[0-1000] or @up:group1,group2 to work, like escaping the comma? would need to play a bit more with that but it doesn't look impossible to me.

We definitely should ship an example of such a source if you come up with one, though, so others can also benefit more readily.

Not entirely sure I agree here. I prefer your initial assessment I agree the '&states:up' idiom is a bit cumbersome so having some shortcut would definitely be nice though. Maybe some day we can add the shortcut and I can owe you a beer (or your preferred beverage)! :)

@degremont
Copy link
Collaborator

I think @thiell and I agreed on processing with a new configuration entry in clush.conf and a command-line option to be added to the CLI for that.

  1. Add a new entry in main section of clush.conf named down_nodes. Add it in ClushConfig, no need for a new property. Add this in documentation.
  2. Add a new --up option in OptionParser.install_nodes_options(). Add this in doc.
  3. In clush.conf, the same way you did for nodown, test if the flag exists, and if true, resolve the content of the down_nodes and excludes these nodes from the computed list.
  4. Add this in nodeset CLI too
  5. Add tests for that.

Don't hesitate to force-push your branch after doing these changes

@hawartens
Copy link
Contributor Author

I think @thiell and I agreed on processing with a new configuration entry in clush.conf and a command-line option to be added to the CLI for that.

  1. Add a new entry in main section of clush.conf named down_nodes. Add it in ClushConfig, no need for a new property. Add this in documentation.
  2. Add a new --up option in OptionParser.install_nodes_options(). Add this in doc.
  3. In clush.conf, the same way you did for nodown, test if the flag exists, and if true, resolve the content of the down_nodes and excludes these nodes from the computed list.
  4. Add this in nodeset CLI too
  5. Add tests for that.

Don't hesitate to force-push your branch after doing these changes

Sorry I have not gotten to this yet, been focused on other things. Will get back to it soon.

@hawartens
Copy link
Contributor Author

hawartens commented Oct 14, 2020

  1. In clush.conf, the same way you did for nodown, test if the flag exists, and if true, resolve the content of the down_nodes and excludes these nodes from the computed list.

@degremont Finally looking at this again. Just want to make sure I really understand what you are asking for here. It sounds to me like instead of adding the genders section to groups.conf, you would prefer that we just have a down_nodes attribute in clush.conf. What do you expect down_nodes to be set to? Would it still be something like:

down_nodes: whatsup -n -d || /bin/true

And other sites can define down_nodes something else (or unset by default). Not 100% sure, but I believe this also means that I still need to add resolver code for that into NodeSet.py. Am I understanding this correctly? Just clarifying so I make sure that I understand what you guys are expecting. Thanks!

@degremont
Copy link
Collaborator

Actually, the idea is indeed to add a new entry in clush.conf, but the content will have a nodeset syntax, like:

down_nodes: @whatsup:down

which could be anything else, like:

down_nodes: @slurm:offline

or any nodeset syntax

down_nodes: @slurm:offline,@foo:bar&@prod:all

That way, this will handle the nodeset resolution for you and you will not have nothing to do.
I put an example of what a whatsup config could be.

You just need in the code something like (kind of):

down_nodes = NodeSet(config.get("down_nodes"))

@hawartens
Copy link
Contributor Author

@degremont Okay. Sounds good. Thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants