Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track regions that fields are valid over #2324

Merged
merged 55 commits into from
Jan 18, 2024
Merged

Track regions that fields are valid over #2324

merged 55 commits into from
Jan 18, 2024

Conversation

dschwoerer
Copy link
Contributor

Resolves #2295

I am not sure it is particular useful for 2D - but at least the patches applied cleanly.

Could be extended, to give a speedup in more cases, as with the region tracking, boundaries can be excluded if they are not needed, at the cost of more lookup-maps (that need to be computed once).

I implemented this only for Field3D because otherwise the 2D-3D look-ups are needed, and I only needed it for Field3D.

Copy link
Member

@ZedThree ZedThree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you give a high-level overview of what this is intended to do, as well as how it works? I gather it's for making sure operations are done over consistent regions, but it looks like it does the operations over the intersection of regions. Do we want that behaviour over just checking the regions are consistent?

I think you also need to add docstrings on all the new functions, as well as some low-level comments. I found Mesh::getCommonRegion quite difficult to follow, for instance.

I'm also a bit worried about performance. There's now more map lookups for everything arithmetic operator, even after the initial creation of the lookups.

Please could you add some before/after timing comparisons, at least for an optimised build?

Comment on lines 550 to 558
if (this->size() != other.size()) {
return false;
}
for (auto i1 = this->begin(), i2 = other.begin(); i1 != this->end(); ++i1, ++i2) {
if (i1 != i2) {
return false;
}
}
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (this->size() != other.size()) {
return false;
}
for (auto i1 = this->begin(), i2 = other.begin(); i1 != this->end(); ++i1, ++i2) {
if (i1 != i2) {
return false;
}
}
return true;
return std::equal(begin(), end(), other.begin());

Should probably be implemented as a free function rather than a member

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::equal in the 3 argument version is actually missing the size check:
https://en.cppreference.com/w/cpp/algorithm/equal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, the four argument version should do the trick

include/bout/region.hxx Outdated Show resolved Hide resolved
include/bout/region.hxx Outdated Show resolved Hide resolved
src/mesh/mesh.cxx Outdated Show resolved Hide resolved
tests/unit/include/bout/test_region.cxx Outdated Show resolved Hide resolved
Direct indexing should be faster
Format output as mardown table and print time in micro seconds.
@dschwoerer
Copy link
Contributor Author

Please could you give a high-level overview of what this is intended to do, as well as how it works? I gather it's for making sure operations are done over consistent regions, but it looks like it does the operations over the intersection of regions. Do we want that behaviour over just checking the regions are consistent?

The idea is that each field can store over what region it has valid data. So if a field is initialised from a constant value, it is valid everywhere. A derivative would only be valid in the region without boundaries. If you multiply it with a derivative, it is still only valid in the region without boundaries, thus the intersection is used.

I think you also need to add docstrings on all the new functions, as well as some low-level comments. I found Mesh::getCommonRegion quite difficult to follow, for instance.

Sure, will do 👍

I'm also a bit worried about performance. There's now more map lookups for everything arithmetic operator, even after the initial creation of the lookups.

Thinking about it again, the map can be avoided, and instead a flattend "2D" array could be used.

Please could you add some before/after timing comparisons, at least for an optimised build?

I guess that would depend on the size of the meshes. It should be particular bad for a small mesh. I tried the example/performance/arithmetic for benchmarking, and am a bit confused about the result (all with --enable-optimize=fast and no guard cells)

with map on a mesh with 2 * 2 * 16 with "RGN_NOZ" and "RGN_ALL" set:

TIMING minimum mean maximum
Fields: 0.536 us 0.572 us 11.163 us
C loop: 0.046 us 0.053 us 0.132 us
Templates: 0.201 us 0.209 us 0.681 us
Range For: 0.138 us 0.163 us 10.706 us

map-free on a mesh with 2 * 2 *16:

TIMING minimum mean maximum
Fields: 0.689 us 0.721 us 11.445 us
C loop: 0.026 us 0.037 us 0.235 us
Templates: 0.184 us 0.200 us 0.976 us
Range For: 0.127 us 0.146 us 0.403 us

map-free on a mesh with 2 * 2 *16 with "RGN_NOZ" and "RGN_ALL" set:

TIMING minimum mean maximum
Fields: 0.569 us 0.587 us 1.570 us
C loop: 0.047 us 0.053 us 0.178 us
Templates: 0.208 us 0.218 us 0.762 us
Range For: 0.141 us 0.159 us 0.401 us

current next on a mesh with 2 * 2 * 16:

TIMING minimum mean maximum
Fields: 0.660 us 0.767 us 12.847 us
C loop: 0.027 us 0.040 us 0.160 us
Templates: 0.180 us 0.198 us 0.619 us
Range For: 0.130 us 0.144 us 0.562 us

@ZedThree
Copy link
Member

Oh that arithmetic performance example is maybe a little out of date now. Timings for a full model might be more instructive, like blob2d or elm-pb.

@ZedThree ZedThree changed the title Track region Track regions fields are valid over May 19, 2021
@ZedThree ZedThree changed the title Track regions fields are valid over Track regions that fields are valid over May 19, 2021
@dschwoerer
Copy link
Contributor Author

I tried the blob2d model, but the variation is to large to see any differences.

I am however thinking of extending this, and also setting the region after taking a derivative the region it is valid for. That also requires some code to extend the region, e.g. after communication or appling boundary conditions. However, that would be incompatible with some models, as BCs implemented in different ways would not be captured automatically.

That could allow to scale a bit better towards small grids, as the halo would not always be included in arithmetic operations, but that would require more work - so not sure it is worth it ...

@dschwoerer dschwoerer mentioned this pull request Mar 11, 2022
@bendudson
Copy link
Contributor

Thanks @dschwoerer ! I think this looks pretty good, but have a couple of questions:

  1. Is this something that should always be done, or is it a correctness check that could be disabled if CHECK == 0 for example?
  2. If a user performs an operation "by hand", e.g. outerloop calculations, what happens if they don't set the valid region?

Happy new year!

@dschwoerer
Copy link
Contributor Author

dschwoerer commented Jan 2, 2024

Thanks @dschwoerer ! I think this looks pretty good, but have a couple of questions:

I am happy to also add this to docs, but I am not sure where.

1. Is this something that should always be done, or is it a correctness check that could be disabled if `CHECK == 0` for example?

I think it should also be done. Right not setRegion() is not called in a lot of places, but if it is, it can result in performance improvements, as e.g. boundary regions can be skipped in the computation.

2. If a user performs an operation "by hand", e.g. outerloop calculations, what happens if they don't set the valid region?

As long as setRegion() is not called, nothing changes.

It is not called in a lot of cases, as I was not sure how to handle users writing data. For example, the derivatives could call setRegion() with the region that they have computed the field, but then if the user implements some custom boundary function, the region will not be updated, and things might break.

Maybe in the future we can add this as optional feature, to call setRegion() for the derivatives, and for the build-in boundaries extend that region appropriately. For outerloop it should be fairly straight forward, as the user does everything, and can ensure that if setRegion() is used, it is used correctly.

Happy new year!

Happy new year and thanks for the questions :-)

Comment on lines 787 to 790
BOUT_OMP(critical(mesh_intersection_realloc))
#if BOUT_USE_OPENMP
if (region3Dintersect.size() <= pos)
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way to write this without the #ifdef guards and repeated conditionals? Maybe BOUT_OMP(single)? If not, then I think this needs a good comment to ensure it doesn't get accidentally removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments. Please let me know if there is still anything unclear.

@@ -174,6 +174,9 @@ public:
/// Return a Region<Ind2D> reference to use to iterate over this field
const Region<Ind2D>& getRegion(REGION region) const;
const Region<Ind2D>& getRegion(const std::string& region_name) const;
const Region<Ind2D>& getDefaultRegion(const std::string& region_name) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding perhaps, but I think the name getDefaultRegion could be confusing: It's getting the field region, using the given region name as the default, rather than getting the default region. Perhaps getRegionWithDefault?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since getRegion gets the region with the given name or ID, perhaps getValidRegionWithDefault is more explicit that it's the valid region that's being requested. Too long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to getValidRegionWithDefault - that is probably easier to understand. The length seems ok to me 👍

Comment on lines 773 to 783
/* Memory layout of indices
* left is lower index, bottom is higher index
* 0 1 2 3
* 0
* 1 0
* 2 1 2
* 3 3 4 5
* 4 6 7 8 9
*
* As we only need half of the square, the indices do not depend on
* the total number of elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation could do with expanding. I think I now understand what's going on, but this diagram is definitely confusing by itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to explain it, but apparently failed to do so.
Would you mind adding a comment, as you understand it now? If not, I can try again, but I am not quite sure how to do it better :-(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function finds the ID of the region corresponding to the intersection of two regions, and caches the result. The cache is a vector, indexed by some function of the two input IDs. Because the intersection of two regions doesn't depend on the order, and the intersection of a region with itself is the identity operation, we can order the IDs numerically and use a generalised triangle number: $[n (n - 1) / 2] + m$ to construct the cache index. This diagram shows the result for the first few numbers.

These indices might be sparse, but presumably we don't expect to store very many intersections so this shouldn't give much overhead.

After calculating the cache index, we look it up in the cache (possibly reallocating to ensure it's large enough). If the index is in the cache, we can just return it as-is, otherwise we need to do a bit more work.

First, we need to fully compute the intersection of the two regions. We then check if this corresponds to an existing region. If so, we cache the ID of that region and return it. Otherwise, we need to store this new region in region3D -- the index in this vector is the ID we need to cache and return here.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/bout/field3d.hxx Show resolved Hide resolved
include/bout/field3d.hxx Show resolved Hide resolved
include/bout/mesh.hxx Show resolved Hide resolved
include/bout/mesh.hxx Show resolved Hide resolved
src/mesh/mesh.cxx Show resolved Hide resolved
@dschwoerer
Copy link
Contributor Author

Failure seems to be related to the fact that the --cflags do not include -std=c++17

I am not sure to what extend that was already an issue in the past while compiling against bout++, but it certainly is now required if you include mesh.hxx or field3d.hxx

@dschwoerer
Copy link
Contributor Author

36f152d might be worth backporting to master?

@@ -468,7 +468,7 @@ set_target_properties(bout++ PROPERTIES
# Set some variables for the bout-config script
set(CONFIG_LDFLAGS "${CONFIG_LDFLAGS} -L\$BOUT_LIB_PATH -lbout++")
set(BOUT_INCLUDE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/include")
set(CONFIG_CFLAGS "${CONFIG_CFLAGS} -I\${BOUT_INCLUDE_PATH} -I${CMAKE_CURRENT_BINARY_DIR}/include ${CMAKE_CXX_FLAGS}")
set(CONFIG_CFLAGS "${CONFIG_CFLAGS} -I\${BOUT_INCLUDE_PATH} -I${CMAKE_CURRENT_BINARY_DIR}/include ${CMAKE_CXX_FLAGS} -std=c++17")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to revisit this as this is compiler-dependent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I am not sure how widely that is used, thus I would wait whether anybody ever complains ...

include/bout/region.hxx Outdated Show resolved Hide resolved
Co-authored-by: Peter Hill <[email protected]>
@@ -24,7 +24,7 @@
#include "bout/index_derivs_interface.hxx"
#include "bout/interpolation_xz.hxx"
#include "bout/mesh.hxx"
#include "bout/output.hxx"
//#include "bout/output.hxx"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//#include "bout/output.hxx"

Comment on lines 773 to 783
/* Memory layout of indices
* left is lower index, bottom is higher index
* 0 1 2 3
* 0
* 1 0
* 2 1 2
* 3 3 4 5
* 4 6 7 8 9
*
* As we only need half of the square, the indices do not depend on
* the total number of elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function finds the ID of the region corresponding to the intersection of two regions, and caches the result. The cache is a vector, indexed by some function of the two input IDs. Because the intersection of two regions doesn't depend on the order, and the intersection of a region with itself is the identity operation, we can order the IDs numerically and use a generalised triangle number: $[n (n - 1) / 2] + m$ to construct the cache index. This diagram shows the result for the first few numbers.

These indices might be sparse, but presumably we don't expect to store very many intersections so this shouldn't give much overhead.

After calculating the cache index, we look it up in the cache (possibly reallocating to ensure it's large enough). If the index is in the cache, we can just return it as-is, otherwise we need to do a bit more work.

First, we need to fully compute the intersection of the two regions. We then check if this corresponds to an existing region. If so, we cache the ID of that region and return it. Otherwise, we need to store this new region in region3D -- the index in this vector is the ID we need to cache and return here.

@ZedThree ZedThree merged commit bffb973 into next Jan 18, 2024
1 check passed
@ZedThree ZedThree deleted the track-region branch January 18, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants