Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add number of OpenMP threads to dump file #2868

Merged
merged 19 commits into from
Feb 23, 2024
Merged

Add number of OpenMP threads to dump file #2868

merged 19 commits into from
Feb 23, 2024

Conversation

dschwoerer
Copy link
Contributor

Much nicer than having to parse the log to get the number of threads for e.g. plotting ...

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dschwoerer
Copy link
Contributor Author

dschwoerer commented Feb 20, 2024 via email

@d7919
Copy link
Member

d7919 commented Feb 20, 2024

On 2/20/24 10:01, David Dickinson wrote: @.**** commented on this pull request. It's a long time since I've worked with BOUT++ internals, but could the |ifdef| be replaced by a check of |options["use_openmp"]|?
I thought the omp_get_max_threads() function would only be definied if we use openmp, thus it might work with an if constexpr, but I am not sure whether that would be C++17? bout::build::use_openmp should be a constexpr, and thus usable in a if constexpr ...

Yes sorry, immediately after posting the comment I realised it was a silly suggestion so deleted it.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dschwoerer
Copy link
Contributor Author

Sorry, I was confused why I couldn't see it on the web.

Looking at: https://en.cppreference.com/w/cpp/language/if it seems that would be possible if that was within a function, but does not seem to be possible in normal function.

We could add int omp_get_max_threads() {return 1}; to a header, for the case without openmp. That would avoid #ifdef in 2 or 3 places ...

@ZedThree
Copy link
Member

We could add int omp_get_max_threads() {return 1}; to a header, for the case without openmp. That would avoid #ifdef in 2 or 3 places

This could be generally useful 👍

Does it make sense to actually return 0 when we don't have OpenMP?

@dschwoerer
Copy link
Contributor Author

There is one case where it may be set to 0 without openmp, but in other cases it is 1.
I think also from a logical point of view, no openmp is mostly equivalent to 1 thread.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

ZedThree
ZedThree previously approved these changes Feb 21, 2024
Copy link
Member

@ZedThree ZedThree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely, thanks @dschwoerer ! Always nice to remove some #ifdefs

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@ZedThree
Copy link
Member

There is one case where it may be set to 0 without openmp, but in other cases it is 1. I think also from a logical point of view, no openmp is mostly equivalent to 1 thread.

I'm on the fence about this. In some sense, without OpenMP we have no threads -- there's definitely a measurable difference between 1 thread and no OpenMP, but I guess we do also store whether or not OpenMP is enabled, so we can use that in post-processing if necessary.

@bendudson
Copy link
Contributor

I vote for 0 == no OpenMP ; 1 == OpenMP with one thread

@ZedThree
Copy link
Member

The OpenMP builds are failing because we're missing

#include "bout/build_defines.hxx"

in openmpwrap.hxx in order to bring in the BOUT_USE_OPENMP macro. I'm a bit worried this means the BOUT_OMP macro has been a no-op in several files for some time!

Otherwise the header does not expose the correct definitions
and macros, leading to potential bugs.
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/bout/array.hxx Show resolved Hide resolved
include/bout/array.hxx Show resolved Hide resolved
inline int constexpr omp_get_num_threads() { return 1; }
inline int constexpr omp_get_thread_num() { return 0; }
#else
#error OpenMP used but BOUT++ thinks it is disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: OpenMP used but BOUT++ thinks it is disabled [clang-diagnostic-error]

#error OpenMP used but BOUT++ thinks it is disabled
 ^

src/bout++.cxx Show resolved Hide resolved
src/bout++.cxx Show resolved Hide resolved
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dschwoerer
Copy link
Contributor Author

The cuda builds are worrying me. They seem to have openmp enabled, but BOUT_USE_OPENMP is disabled. That means that all the thread-safety feature we can enable are not enabled. That might work most of the time (I have not tried) but can lead to race conditions, data corruption (two threads using the same data block) and crashes (random and annoying, but still better then simply wrong results).

I have added a check now, which is why the cuda build fails. One solution would be to just enable openmp for BOUT++ for cuda.
Is that a viable approach? Do we need to options, one for ensuring thread safety, and one for parallelising? Why is openmp enabled for cuda?

The other builds with openmp are failing in the test for bout-config - for some reason -fopenmp does not get added to it, I am confused as to why that fails ...

@ZedThree
Copy link
Member

@ZedThree
Copy link
Member

Also, I noticed has-openmp doesn't appear in the output of bout-config --all

@dschwoerer
Copy link
Contributor Author

So something adds -fopenmp. @ggeorgakoudis did you add that intentionally? Is that needed? If so, I think we should just enable openmp for bout++ as well. Then we get the thread safety. If it turns out for performance reasons we do want to enable openmp but not have bout++ do parallelisation, we can add that option later. This PR is already enormous for adding 1 single integer to the dump file ...

@ZedThree
Copy link
Member

It might be coming from RAJA or UMPIRE, it doesn't appear to be turned on after we detect CUDA: https://github.com/boutproject/BOUT-dev/actions/runs/8002552988/job/21856012784#step:4:64

OpenMP is enabled any way, but if bout is not aware of using openmp, it might
misbehave.
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

dschwoerer and others added 3 commits February 22, 2024 12:33
BOUT_OMP is split in BOUT_OMP_SAFE for cases where we want to ensure
that bout++ behaves correctly in an openmp parallel environement.
BOUT_OMP_PERF on the other hand enables using openmp for parallel
regions.

BOUT_OMP_SAFE is enabled whenever openmp is detected, while
BOUT_OMP_PERF is a user option.
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@ZedThree
Copy link
Member

Thanks for fixing this @dschwoerer, but it feels a bit like overkill now, as you say just to add one flag to the output!

It sort of sidesteps the issue of why the CUDA build is getting OpenMP turned on when we haven't requested it. @ggeorgakoudis Is this coming from the RAJA dependency? Should we actually be forcing OpenMP on in BOUT++ if RAJA is built with OpenMP?

Just because a dependency uses OpenMP doesn't necessarily mean we need to enable it in BOUT++ to ensure thread safety -- that depends on whether we're passing callbacks that include thread unsafe code.

@ggeorgakoudis
Copy link
Collaborator

@ZedThree No, I haven't made any changes to the build system so I do not explicitly set openmp flags in compilation. It may be picked up by including RAJA (since its spack installation in the CI container includes the openmp variant), although I haven't found the exact spot in the cmake file hierarchy where this happens (let me know if you spot it). We can enable openmp in the CUDA CI configuration (https://github.com/boutproject/BOUT-dev/actions/runs/8002552988/job/21856012784#step:4:15) to avoid BOUT++ thinking it builds without openmp when RAJA pulls that in. Makes sense?

@dschwoerer
Copy link
Contributor Author

Thanks for fixing this @dschwoerer, but it feels a bit like overkill now, as you say just to add one flag to the output!

I did not mean overkill, I just meant feature creep. I think there was a bug that hasn't been noticed before and this fixes it.
There was some consideration for parallelising Hermes-3 with OpenMP, and for this we would also need to be thread safe, but not parallelise the for loops. Thus this PR might be interesting for @bendudson as well ...

Just because a dependency uses OpenMP doesn't necessarily mean we need to enable it in BOUT++ to ensure thread safety -- that depends on whether we're passing callbacks that include thread unsafe code.

Why would this be limited to callbacks?
If the function calls with different threats into some function that allocations some memory, and we use our cached malloc pool, that could lead to issues without any callbacks.

I think BOUT++ should always try to be on the safe side. Otherwise we could go back to commit 62f3d2c and ignore all of the other changes ... however I think it is not so bad, compared to getting sometimes wrong results because some memory block is double used and contains the wrong data.

@bendudson bendudson merged commit 5d0a1ad into next Feb 23, 2024
27 of 28 checks passed
@bendudson bendudson deleted the dump-openmp branch February 23, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants