Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add KEP for NodeResourceFitPlus and ScarceResourceAvoidance #194

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

LY-today
Copy link

What would you like to be added?

What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.

Why is this needed:
There are related descriptions above

Is there a suggested solution, if so, please add it:

plugin-one

config:

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

config description:
image

node score:

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

plugin-two

config:

resources: 
- nvidia.com/gpu 

config description:
image

node score:

finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum

Why is this needed?

It’s introduced above

Signed-off-by: LY-today <[email protected]>
Signed-off-by: LY-today <[email protected]>
@LY-today
Copy link
Author

LY-today commented Jan 6, 2025

@ZiMengSheng hi,Is there anything else that needs to be adjusted?

Signed-off-by: LY-today <[email protected]>
Signed-off-by: LY-today <[email protected]>
@saintube
Copy link
Member

/lgtm

@saintube saintube added the lgtm label Jan 20, 2025
@saintube saintube changed the title feat: add kep md feat: add KEP for NodeResourceFitPlus and ScarceResourceAvoidance Jan 20, 2025
@LY-today
Copy link
Author

@hormes Is there any other work to be done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants