-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caliper Results #85
Comments
All points in the email from @pearce8 (Monday, February 12, 2024 at 15:37) satisfied. |
New caliper amg with caliper-libs flag |
caliper_amg_Noon.tar.gz Ok Here's another small caliper_amg run. You can see the configs in the out.amg... file. I'm including the build logs for amg and caliper too. One thing that might be messing things up is that I have to build caliper and adiak with gcc and the apps with intel. Perhaps I could build amg/hypre with gcc as well, but branson and parthenon highly recommend using intel. Caliper can't be built with intel on roci (and I really tried) (not 100% sure about XRDS) because the intel compiler doesn't have the filesystem library and headers in the right place. Default is c++14 and even turning on c++17 doesn't get you the right config to make it happen. GNU's default is C++17 and cce builds fine with c++17 turned on. Would building as a shared lib change anything? |
Parthenon successful caliper run. Significantly reduced problem size. |
Hi @dmageeLANL, Looking at the parthenon logs it seems like the code creates a lot of threads (441 in this run to be exact). I suspect it's creating/destroying OS threads in a loop instead of using a fixed thread pool. That is a problem since Caliper keeps a fair amount of per-thread data around until flush and/or program exit, which would explain how it runs out of memory. Currently the only way around this is to not put Caliper annotations on the sub threads. Instead, put annotations only on the main thread. A problem here might MPI if it's called from the sub threads. You can try running without the I can try and come up with some solutions to the memory issue, but I'd like to understand what is going on in Parthenon a bit better (e.g. is it only the MPI calls on the sub threads or some of your own annotations as well). As a side note, the code spends a lot of time in |
Post Caliper Results here for LLNL.
The text was updated successfully, but these errors were encountered: