-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embed SELinux security context into credential metadata #70
Comments
Email conversations with Steve Lawrence (@stevedlawrence) and Tim Wickberg (@wickberg): From: Chris Dunlap The MUNGE daemon communicates with a client over a Unix domain socket. Under Linux, it uses the Is that basically how you see this working? What other attributes are you needing to pass, how are these obtained, etc.? From: Steve Lawrence That all sounds good to me, except for one potential concern, which is the use of A domain transition is somewhat analogous to the setuid/setgid bits in standard Linux DAC. When these bits are set, when a user executes a file, the effective uid and gid of the new process are changed to that of the executable. A domain transition works similarly -- when a user executes a file, there are ways to change the SELinux context of the new process. This is often used to restrict or elevate privileges for the new process and is very common. As an example, say we have a very restricted user that has no network access enforced by SELinux, running under the SELinux context "restricted_user_t". When the user executes srun, we could then perform a domain transition to a more privileged context "srun_t", which does have network access so slurm can submit the job to the master. The issue here is that if srun runs under "srun_t", One option to resolve this potential issue is to configure the SELinux policy to just not do a domain transition when a user executes srun. This way, srun will run as the user's context "restricted_user_t", and In the case of this project, I don't think that would violate any security requirements, but I haven't looked in detail at what srun does to submit a job. Though, if effort is being put into making this change, it might be worth the effort to make it general purpose enough so that if we or someone else does want to separate srun into a different context, then they can do so. If this weren't an issue, I would very likely recommend doing the domain transition, so this isn't a totally made up use case. Another option, munge could determine the process id of the submitting process (not srun, but whatever executed srun) and then use As an example, the process tree with contexts might look something like this:
In this case, munge would want to find the pid of the bash process and use Another related issue is the tools that wrap srun (e.g., salloc, sbatch). With these wrapper tools, the process tree might look something like this if we transitioned on the tools in addition to srun:
Note that in SELinux, salloc and srun are likely considered security equivalent and would run under the same context. In this case, munge would need to find the parent of salloc and get its context, rather than the parent of srun. An alternative here would be to just not transition when salloc is executed, but instead only when srun is executed. In which case the process tree looks like this:
In this case, it would be reasonable to get the context of srun's parent, which is salloc with the same context as bash (i.e., we can just always get the context of the parent of srun). I'm not familiar enough with salloc and other wrappers to know if they have any privileges that a restricted user shouldn't have, or if they really are just fairly thin wrappers of srun. My guess is they don't really have excessive privilege requirements, and so this option would be viable and munge can just always get the context of the parent of srun. I'm also not familiar with what privileges srun needs, so it may be entirely reasonable to run it as the user context, and From: Chris Dunlap Thanks, Steve! This really helps. I was worried 1. From a security perspective, is there a difference between having srun obtain the security context vs. having MUNGE obtain it? In particular, srun could call 2. Is there a way to algorithmically determine how far back in the process tree you need to go to get the appropriate security context? For a general purpose solution, I'm worried there could be multiple (or no) wrappers. Should you keep traversing the parent PIDs until the process cmdline matches a shell? From: Steve Lawrence 1. From an SELinux perspective, I don't think there's really a difference between MUNGE or srun getting the security context. But from a general security perspective, it might make sense to have MUNGE get it. I don't really know the details of how MUNGE and srun interact, but if srun sends the context to MUNGE or just appends something to MUNGE's payload, that means there's some inherent trust in srun to do the right thing. And there might be a potential attack where a malicious user could download and execute a modified srun that appends an arbitrary context that has elevated privileges. So I'm not sure we can really trust having srun get the context. And I believe this is why MUNGE exists in the first place. I believe it has private access to keys used to sign the uid/gid credentials. Those keys wouldn't be accessible by a modified malicious binary so it would be difficult to forge the SELinux context without the master noticing. I presumed this is why MUNGE is trusted to get the uid/gid right now, because of this signing. Side-note, SELinux could easily prevent the attack I mentioned of a user running a modified srun, but security is all about layers, so if we can add an extra layer of whatever security MUNGE provides, that would be preferred. 2. This is where things get tricky. It's probably best not to look at parents to find a command shell, because there could be cases where a slurm job isn't submitted from a shell. E.g., maybe it's submitted from a GUI or from cron or something that isn't necessarily a shell. It would be fine for our use case, but I can imagine cases where it might not work. The best solution I've thought of so far is to have the ability to configure Slurm/MUNGE with a list of slurm-related SELinux contexts, and MUNGE would just keep looking at parent processes until it found one that doesn't have one of these slurm contexts. The benefit to this is that if we wanted to have different SELinux contexts for srun and the wrappers we could do so. I imagine the pseudo-code would look something like this:
I'm not a huge fan of this, seems a bit over-engineered. But it is generic and should support all potential use cases. And it does use SELinux contexts to determine which processes to ignore, which I like since a user has no control over their SELinux context. Instead, the SELinux policy author has full control over how slurm determines which SELinux context to use without needing to mess with slurm at all. And this type of list of contexts isn't unprecedented. There's a handful of files in All that said, if that all sounds overly complex, it would probably be reasonable to just always get the context of the parent of srun, it's just not the most flexible. Doing that would limit us to not writing a policy for wrappers, which is probably sufficient, but without an in-depth analysis of these wrappers it's hard to say for sure. If we want to play it safe and leave that door open, something like the above suggestion might be needed. From: Chris Dunlap MUNGE and srun interact via the C library MUNGE was originally developed to address several problems we were having with user authentication in the context of a cluster resource manager. We needed a portable API that worked across several flavors of Unix. We couldn't rely on root privileges or reserved ports. The remote daemons on the compute nodes within the cluster needed to reliably determine the user's identity. We couldn't trust the client for the user's identity. The credential had to be tamper-proof since it was being forwarded by untrusted intermediaries. And encoding/decoding the credential had to be fast at scale. I think having MUNGE obtain the SELinux security context makes sense. It falls within the bailiwick of its user authentication. The issue has more to do with the time frame. I'm currently envisioning adding a
The exclusion list is sent to munged in the request (via
If the exclusion list is simply NULL, the for-loop will exit with the SELinux context of the parent of the peer process (e.g., the parent of srun). The uid check ensures the uid of the selected process matches the uid of the peer process that called A few more questions... Do you foresee any issues with the uid check above? Am I correct in assuming the uid of the matching process must match the uid of the process that requested the credential? Would the SELinux contexts in the exclusion list always be specified as user:role:domain, or could they be abbreviated with just the domain (e.g., "system_u:system_r:srun_t" vs. "srun_t")? And what about security levels? From: Steve Lawrence Sounds like the right approach to me.
Any particular reason to get the context list from srun rather than MUNGE having the list? srun really doesn't need to know anything about SELinux, so if all the logic, including what a srun context is, could be kept in MUNGE that might simplify things a bit. I'd also probably prefer that the slurm_contexts not be hard coded. An SELinux policy is free to run srun and wrappers in whatever context they want, so hard-coding types could be limiting. For example, some policies are starting to drop the "_t" suffix, some policies are starting to have types like "srun.process", etc., so it isn't unrealistic that srun could run as something different than the list you have. A configuration file like To get the contexts directory of the currently loaded policy you can use the As to the questions:
That check doesn't seem unreasonable to me. The only time I think it could cause issues is if someone had the setuid bit set on srun or a wrapper, but I can't imagine a system that would do that, and it would probably break other things if they did. That said, I can't think of a way where the loop would get up to init if the uid check were removed, but it can't hurt and is added protection from something dangerous. Another option would be instead of having a list of slurm contexts, have whitelist of contexts that are allowed to submit slurm jobs. Then you traverse up the process tree until you find one of the whitelisted contexts. If it never finds one, the job isn't submitted. This is probably better from a security perspective, but is more difficult to maintain since it requires updates anytime something new is allowed to execute slurm. I prefer the existing solution, but both have their pros and cons.
I think all we really care about is the type/domain. The user, role, and (optional) security levels are all based on the context of the process that executed srun, so that could vary. For example, if a "staff" user ran srun, the full context might look like:
Whereas a guests context might be:
In this case, all we really care about when determining which process submitted the job is the type, srun_t, which remains constant regardless of who executes it. That is what lets us know if a process is a slurm-related process or not. So the exclusion list should really just contain types. Note that
Note that you'll still want to send From: Chris Dunlap I don't see how MUNGE could manage the context list. It doesn't have any knowledge of Slurm or the client that's calling into From: Steve Lawrence I didn't realize MUNGE could be used for things other than Slurm. I now see why you suggested that slurm send contexts to MUNGE, and that changes things a little bit. However, one concern with this is that it requires some inherent trust in srun to send appropriate contexts. If srun were compromised, it could send contexts in addition to the slurm contexts and result in the wrong context being used for the job. And in our specific use case, the uid check wouldn't even save us, because our processes calling srun run under the same uid but with different contexts. I personally would rather put more trust in MUNGE than in srun, since it seems to be smaller and more security-focused, so I'd prefer it handles getting the list of client types. The added benefit here is clients don't need to make code changes to get SELinux support. Only MUNGE and how it gets the list of client contexts need to be updated, and a config file makes that very easy. We could go complex and make the config file a list of types depending on the client type. Calling the confg file something like
The first type is the type of the client, and the types after the equal sign are parent contexts that should be ignored. So it should be similar logic to what you have now, but you first need to pick the appropriate list based on the client context. This config file wouldn't need to be updated often, so it really only needs to be parsed and loaded once when munge starts. Though, I'm still not entirely convinced we even need the srun/salloc/sbatch context distinctions. If anyone knows if these wrappers really are different than srun from a security perspective, that would be good to know. But I suspect that salloc/sbatch/srun can all run as the same SELinux context, which simplifies both the config file format and, I think, the code. The config would then just contain:
and the code might look something like this:
If the context of the client is one of those listed in the munge_client_contexts file, then we find the first parent process that has a context other than the client context. If the context of the client is not in the list, then we just use the client context. This way is maybe not as generic and wouldn't support wrappers with different contexts, but I'm not convinced that's necessary. And it makes the config file quite a bit simpler -- just a list of types rather than mapping between type and list. I'm not sure if the code is any simpler, but it does avoid having to load and lookup one of multiple lists. From: Chris Dunlap OK, this makes sense. What's the rationale for placing the configuration in From: Steve Lawrence Yes, that's the idea. The types that are assigned to srun are specific to each policy. For example, one policy might using the type "srun_t", where another policy might use the type "srun.process", and another might be something generic like "munge_client_t". Since these types are tied specifically to the policy, the convention is for SELinux-aware applications to get their SELinux-aware configurations from the policy directory. Although not super common, this allows users to easily switch between policies without having to modify the specific configurations of SELinux-aware applications. Most SELinux-aware applications have a config file in The format of the file is completely up to the application, so you could make it very similar to the munge config file and reuse that config parsing logic if you wanted. SELinux has an API function to get the contexts directory:
So you don't need to use any complex logic to figure out which policy directory to use. You just need to call that and append "munge_client_contexts". From: Tim Wickberg Hi Steve - I spent some time chatting with Chris today trying to come up to speed on how best to handle securely transmitting the SELinux context, and it does sound like there are a few subtleties that I had not been aware of that need to be resolved. Specifically, if the salloc/sbatch/srun commands themselves have been placed into different contexts, then MUNGE would not see the context that would ideally be the one restored when launching the users' processes on the compute nodes. Is this case -- where salloc/sbatch/srun have different labels -- an expected one for the LLNL deployment under discussion, or more of a hypothetical issue? I think this is critical to deciding what path we take, and does have significant ramifications for when parts of this may be able to be developed and by whom. At the moment, my working expectation has been that salloc/sbatch/srun are all within whatever the users' "normal" context is, and have not had any different context applied that would lead to a domain transition. In which case, MUNGE could get that context through If they do have a different context, that is when things get tricky. I know Chris outlined a scenario where MUNGE looks up the pid of, e.g., srun, and walks up the process tree to find a usable label, but I'd really rather not see MUNGE tackle that complexity. There are all sorts of subtle race conditions that would expose MUNGE to with how the /proc data is maintained -- Slurm itself has to deal with some of these and they're a huge hassle -- and I'd rather not see MUNGE adopt that complexity both from a security and performance standpoint. There is a potential alternative here if the salloc/sbatch/srun processes do not map directly to the required context: simply rely on some other mechanism to inform Slurm in a secure fashion what the desired context is. This can be accomplished through the use of a "job_submit" plugin, which is a piece of code running inside the slurmctld process, and which could provide the appropriate result based on the uid/gid of the submitting user through some other mechanism. Alternatively, it may be simpler to configure pam_selinux alongside Slurm's From: Steve Lawrence
srun will almost certainly be run under a different label. It's still a hypothetical that salloc/sbatch will run under a different label or if they can just run as the label as whatever executed them. It's possible for security reasons that salloc and sbatch should run as something different (likely the same as srun), but I don't know enough about what capabilities salloc/sbatch need compared to a normal user vs srun. I'm sure srun requires extra privileges (e.g. open network connections to the master). But what I don't know is what salloc and sbatch need or what they really do under the hood. No one has clarified this to me.
I think we need to change this expectation. I believe it will violate security requires if at a minimum srun does not transition to something different, so getting the context via
Understandable. That said, it sounds to me like MUNGE is responsible and trusted to perform the security logic. Hence the reason why Slurm does not do it. It seems logical that it also handle this. But I'm open to alternatives. I'm not all that familiar with slurm/MUNGE.
Unfortunately, we cannot rely on uid/gid. In our particular use case, a user will switch between different security labels, but stay under the same uid/gid. We need slurm/MUNGE/something to determine what security label was used to submit a job and maintain that when executing the job.
For similar reasons above, I'm not sure PAM will work since what the user logs in as isn't necessarily the same as the security label that a job could be executed as. A user can switch to one of many security labels after logging in and submit jobs from different labels. From: Tim Wickberg
salloc/sbatch have very similar requirements to srun -- being able to connect out to the controller and compute nodes, as well as establish ephemeral TCP ports to listen on -- and would need equivalent privileges.
That's unfortunate, and at this point I believe precludes having MUNGE provide this info for us.
This is a problem as well. What you've described here implies the only way to find the "correct" label that the job should be running under is to walk up the process tree, and stop once an appropriate label as been identified. However, building such a process tree representation in a secure manner is very difficult on Linux system, and there are many subtle race conditions that -- in some cases -- cannot be overcome 100%. So modifying MUNGE to provide this is a non-starter as well from my perspective, as I certainly would not want to develop or submit insecure extensions to MUNGE, and I'm sure Chris would not accept such work either. The salloc/sbatch/srun client commands themselves cannot provide this info either. They're inherently untrusted, and there is nothing that stops a user from installing modified versions bypassing any logic we add to handle this. I think that we've thus reached an impasse, and I don't have a workable solution given your current set of requirements. From my perspective, either salloc/sbatch/srun need to run under the context you expect the job to run with on the node, or you need to provide the slurmctld process with some way to securely look up the correct mapping for a given user/group (possibly alongside some additional job annotations such as QOS, account, or WCKey). From: Steve Lawrence
Ok, then is sounds to me like ideally salloc/sbatch would have the same label as srun. This only adds extra complication, since it requires looking up a process tree rather than at just a single parent.
Ok, I'll have to see if there are any alternatives. I have one idea, but it's extremely ugly -- we essentially need a duplicate srun/salloc/sbatch executable file for every different security label that one could transition from, and the user needs to execute the right one. SELinux could enforce that only the right one could be executed, but the user much know which one to execute, or we provide a wrapper script that figures it out. This allows srun to have additional privileges to submit jobs while still have each execution of srun to have a unique label. Out of curiosity, what are the race conditions with calculating parent process ids? I can certainly imagine complexities (e.g. parent processes terminating, various process states to deal with), but I'm not aware of any race conditions with security implications. Maybe something like srun and the parent exiting before munge reads /proc and another processing starting with the same pid?
Agreed. That's why I was hoping MUNGE would be able to handle it.
Are there any annotations that are specific to a process? How are such annotations usually applied? From: Tim Wickberg
That sounds like a complete nightmare to maintain, and I wouldn't expect to do anything in Slurm in support of such a model.
You've hit on the main issue, yes. There is no atomic method to interact with /proc; when you fetch the ppid of a process, there is always a potential race in between reading that value and opening /proc/$ppid/stat to fetch further info on that parent in which the parent has exited and the pid reassigned to a new process.
At job submission and/or step launch time through various arguments. The way in which a given site uses these varies widely, but the But I don't think that helps here. Any of these can be set to arbitrary values by the user -- enforcement happens within our slurmctld process on the control node -- and thus cannot be used for security. To rephrase some of my earlier remarks, if the design for the SELinux contexts on these systems has salloc/sbatch/srun running in some other context and not the desired context for the users' processes to launch in on the compute nodes, I do not know there is anything we can do here. Please understand that anything I build in Slurm needs to be usable on other systems. I cannot build a one-off set of commands, installation mechanisms, or other work that is only of use in this very specific LLNL deployment. That unfortunately seems to be the path we're headed down at this point. From: Steve Lawrence
Agreed. Nothing in slurm would need to change with this. This would all need to be managed by LLNL if we went this route.
I'm surprised no one has written up a reasonable solution to this problem, but I did find mentions of using
I think the only race-y thing here is the original
Completely understood. I don't believe the need to get the parent process context is specific to LLNL's use case. IMHO, any system that cares enough about security to want Slurm SELinux support should also want to have separate labels for srun/salloc/sbatch, and should also want to use the label of the submitting process instead of one of these labels. This follows the principal of least privilege and ensures that processes that can execute srun do not need to have all the privileges that srun requires. From: Chris Dunlap Sorry for the delayed response. I'm open to adding SELinux support to MUNGE -- assuming all potential race conditions and issues of correctness are addressed, obviously. The main issue is a question of when. There are additional complexities that will take time. This code would be running in a munged thread that cannot block. I don't expect these calls to block since they're accessing kernel data structures, but they all have to be checked. If some could potentially block, that adds more complexity. There's the issue of the credential format which needs to be updated to an extensible format in order to support new features. And there's the compatibility code to handle both the new and old formats. There's portability concerns. The SELinux bits have to be properly autoconf'd and tested on multiple platforms, some of them dating back years. MUNGE still has to build on systems that don't support SELinux. Builds that don't support SELinux need to handle credentials from builds that do, and vice-versa. Automated tests need to be added to the test suite. There's the performance impact this will have which becomes a concern at scale. The cost of these additional system calls adds up which will reduce credential throughput. I might be able to mitigate this some by checking if the process start time is monotonically decreasing as you traverse up the process tree, but I'd need to research that more. (There was a recent issue where an attacker could delay the process start time since There's the requisite plumbing to extend the There's the These are just the issues off the top of my head. I haven't had time to do a detailed design. There could be other surprises. In short, this feature adds a lot of complexity that will take time to properly address. Initially, I was told the anticipated time frame for this would be around the end of August. That's not a target I can meet. My plan for the short-term was to have Slurm do the security context lookup and pass the context string to So far, I've largely been gathering requirements. I'd want to allow for at least 6 months development time to address the concerns noted above. That time would have to be negotiated with my management. From: Steve Lawrence Ok. Thanks for the response. I'll talk to Carlos and see if any of the options that require less intrusive MUNGE changes (i.e. just get the context of the socket) can be made to work. The options are pretty ugly and does feel like the "right" way to do it, but it's certainly less complex and is more likely to fit in their time frame. From: Steve Lawrence Thinking more on this, I think I may have a solution that's fairly clean from an SELinux perspective (more complicated and a bit awkward, but nothing too bad), but requires a reliable and specific process tree from slurm. The required changes to MUNGE follow the original approach, in which MUNGE gets the context of the srun socket (no process tree craziness) and a slurm process will use But I need some clarification on how a job is executed to be sure this will work. Can you verify the process tree that leads to the execution of the job once slurmd receives it? As I recall the slurmd daemon spawns a new instance of slurmstepd for every job, and then slurmstepd executes the actual job binary? So a process tree of multiple jobs running might look something like this:
Are the slurmstepd processes always unique for each job, and do not share any resources with any other jobs? If so, we can define the SELinux policy so srun/salloc get a unique type based on what executes it without too much hassle. So every srun instance gets a unique and limited type. Then slurmd can execute slurmstepd with that type. And finally slurmstepd will transition back to the original type when it executes a job. So the whole thing looks something like this: On the job submission node:
MUNGE will then get the context of the srun socket (i.e. projectX_srun_t) and securely pass that to the master slurm node. On the master node: When slurmd executes slurmstepd it will transition (via
This is a bit awkward (and potentially a security violation) because the special *_srun_t type needs the combined privileges of srun/salloc and slurmstepd. Really slurmstepd should be someting like *_slurmstepd_t but we can't do that). There are ways we can split things up in policy so projectX_srun_t has different privileges depending if it's on a master node or not, but it's bit harder to manage. Some questions: 1. Is my understanding of the process tree correct for executing a job? 2. Are any resources shared/reused between different slurmstepd processes? Are they truly one-off and solitary processes per job? 3. Do srun/salloc and slurmstepd require essentially the same permissions, or are they really very different? I assume they're pretty different, but maybe not? Is slurmstepd what actually communicates back to salloc/srun to provide the job output, or is that routed through slurmd? 4. Are there any concerns with these changes to MUNGE and slurmd? I think this is back to the original approach, so hopefully not. From: Chris Dunlap Slurm specifics are outside my bailiwick, so I'll leave those for Tim. To be honest, I don't fully understand the proposed SELinux policy. It seems convoluted to me, and that makes me think this approach is likely to result in a site-specific and workload-specific feature. Also, relying on specific Slurm behavior makes this a resource manager specific solution, not a general-purpose one. Any features I add to MUNGE have to provide a general-purpose solution. While spelunking /proc to locate an appropriate process adds substantial complexity, it also seems to offer a general-purpose solution. Removing that part only removes ~20% of the work involved. The rationale for placing that code in MUNGE is to remove the need for applications to roll their own (potentially incompatible or insecure) solution. MUNGE strives to provide basic user authentication information in a simple, secure, fast, and portable manner. Unfortunately, properly addressing this complexity will take significant time. I don't see a quick path forward for adding this feature to MUNGE. Given the time requirements, you might want to pursue other solutions. From: Steve Lawrence As someone that's done alot of SELinux work, this really isn't anything that convoluted from an SELinux perspective. Ensuring that domain transitions happen with the right domains in the right spot is a good chunk of what gives SELinux is strength, and also what can make it seem complicated. So while this might seem convoluted, it really isn't anything too crazy to any familiar with SELinux. The only thing here the complicates things is that slurm/MUNGE need to be SELinux aware. Which does require understanding how the processes all interact to know the correct place to insert the awareness. There's really no way around this, regardless of which approach is taken. And there are some good security reasons to have srun/alloc transition to something like projecA_srun_t, so regardless if MUNGE just uses the context of srun/salloc or if it looks up the process tree, we'll probably still do that. And the security reasons also apply to slurmstepd too (though, it's still not clear to me how slurmstepd gets information back to salloc/srun). So restoring the process tree context might not actually be the best thing, since it doesn't allow separating slurmstepd. I'm not sure I understand what makes this not general purpose. I believe the way MUNGE works right now is it captures UID/GID information from srun, provides that info to slurmd, and then slurmd restores that at some point. I think we should be able to basically do the same thing but with SELinux. And SELinux policy can can be made as complicated or as simple to take advantage of this. For example of a simple usage, assume we didn't have any of our LLNL project stuff going on, and there was just a single user and we don't care about transition srun. In this case, the SELinux policy would be configured do to this:
So salloc and srun do not have a transitions and they all run in user_t. MUNGE would capture user_t from the srun socket (no proecess tree) and send that to slurmd. Which looks like this:
So slurmd would be SELinux aware to execute slurmstepd as user_t, which would then execute the job. This way, the job is executed exactly as if the user had executed it. So in the non-complicated LLNL case, this is all pretty straightforward. And the LLNL will just add some extra transitions (that MUNGE/slurm do not need to be aware of) to make sure things are a bit more restricted to meet its security requirements. Another question I thought of: 1. Where does the UID/GID get restored by slurm? Is slurmstepd executed as the original UID/GID, or just the job that slurmstepd executes? The other questions for reference: 2. Is my understanding of the process tree correct for executing a job? 3. Are any resources shared/reused between different slurmstepd processes? Are they truly one-off and solitary processes per job? 4. Do srun/salloc and slurmstepd require essentially the same permissions, or are they really very different? I assume they're pretty different, but maybe not? Is slurmstepd what actually communicates back to salloc/srun to provide the job output, or is that routed through slurmd? 5. Are there any concerns with these changes to MUNGE and slurmd? I think this is back to the original approach, so hopefully not. From: Steve Lawrence Just want to add that after talking with some colleagues, they agree that the process tree thing isn't as good as it sounds once we really dug into the implications. It sounds right, but doesn't allow for as much confinement of slurmstepd as one would really like. And as you mentioned, the process tree lookup is complex and potentially racey if not done correctly. I think the best way forward, for both time and capabilities, is for MUNGE to just get the context of the peer socket via A colleague also mentioned that it might make things cleaner if MUNGE also had the ability to do some sort of translation on the context. For example, if I do still have some unknowns about exactly how slurmd/slurmstepd/jobs interact, so those questions need to be answered until I can be 100% confident this solution works. |
Cautionary note since [Linux] /proc/pid/stat parsing bugs
|
The SELinux context can be fetched using the SO_PEERSEC option to getsockopt()... I wouldn't expect MUNGE to do anything unusual with /proc there. |
There's been some discussion in this thread on traversing |
Embed the SELinux security context into the credential metadata. This will benefit resource managers (e.g., Flux and Slurm) that want to confine jobs to a specific SELinux context.
The goal is to allow a user to change into a project context where they can only write to specific directories, have that user launch a job with that security context, and ensure the job runs in that security context in order to prevent cross-contamination of data.
munged can retrieve the context of a peer socket viaGetting the appropriate security context of the peer is going to be tricky.getpeercon(3)
.The text was updated successfully, but these errors were encountered: