Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor optimizations on MooseVariableData #29697

Open
wants to merge 18 commits into
base: next
Choose a base branch
from

Conversation

GiudGiud
Copy link
Contributor

@GiudGiud GiudGiud commented Jan 17, 2025

locally (M1 max) I observed 8% speedup on simple_diffusion.i with -r 7.
I am hoping someone can reproduce so we can merge this.
I doubt the test suite will show a consistent improvement but let's see with this PR's run

@GiudGiud GiudGiud self-assigned this Jan 17, 2025
@moosebuild
Copy link
Contributor

moosebuild commented Jan 17, 2025

Job Documentation, step Docs: sync website on d7b226f wanted to post the following:

View the site here

This comment will be updated on new commits.

@moosebuild
Copy link
Contributor

moosebuild commented Jan 17, 2025

Job Coverage, step Generate coverage on d7b226f wanted to post the following:

Framework coverage

70e113 #29697 d7b226
Total Total +/- New
Rate 85.25% 85.29% +0.05% 79.28%
Hits 108182 107972 -210 176
Misses 18725 18621 -104 46

Diff coverage report

Full coverage report

Modules coverage

Coverage did not change

Full coverage reports

Reports

Warnings

  • framework new line coverage rate 79.28% is less than the suggested 90.0%

This comment will be updated on new commits.

@GiudGiud GiudGiud marked this pull request as ready for review January 27, 2025 14:43
@GiudGiud GiudGiud requested a review from lindsayad as a code owner January 27, 2025 14:43
@moosebuild
Copy link
Contributor

Job Controlled app tests on dd74a6c : invalidated by @loganharbour

@moosebuild
Copy link
Contributor

Job Disable HDF5 on dd74a6c : invalidated by @loganharbour

@MengnanLi91
Copy link
Contributor

I'll test this on my M1 and see how it goes

@GiudGiud
Copy link
Contributor Author

looking at the test suite, the Framework 1 and 2 recipe and Module
https://civet.inl.gov/branch/6695/

it seems we are running:

  • Framework 1 5 minutes slower
  • Framework 2 25 minutes faster
  • Modules 12 minutes faster
    but maybe it's just build machine differences

@MengnanLi91
Copy link
Contributor

looking at the test suite, the Framework 1 and 2 recipe and Module

https://civet.inl.gov/branch/6695/

it seems we are running:

  • Framework 1 5 minutes slower

  • Framework 2 25 minutes faster

  • Modules 12 minutes faster

but maybe it's just build machine differences

I tried on my Mac M1 with mpiexec -n 10 -i simple_diffusion.i -r 7. Not much difference in total execution time

@MengnanLi91
Copy link
Contributor

Not sure if the communication time matters here. I can try a serial run tomorrow

@GiudGiud
Copy link
Contributor Author

looking at more recipes, OpenMPI and ARM Mac, seems 0 change in runtime.

@MengnanLi91
Copy link
Contributor

I ran it with perf_graph, the memory seems to change with new commit. I think it may have a direct effect from the changes

@GiudGiud
Copy link
Contributor Author

for the better or worse?

@loganharbour
Copy link
Member

Profiling reveals a factor of 2 improvement in MooseVariableData::computeValues() with simple diffusion, 7 refinements. This problem is the worst case for testing this as all we do is fetch vector tags. With problems that are more complex, the improvement will likely be better. Testing a transient problem now.

@loganharbour
Copy link
Member

loganharbour commented Jan 29, 2025

I did some verification about the inlining of the lambdas with our current GCC and clang with -O2: https://godbolt.org/z/94ra47jnK

GCC is smart enough to see that the version with the lambda is exactly the same as the one without, and actually just produces a jump. Cool cool cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants