-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [BUG] - <title>Misallocation of stations to slices #1745
Comments
I believe I found the reason for the problem. find_local_coordinates in locate_point allows to search points also slightly outside elements (up to 1 percent difference) by "extending" it, meaning that the closest point in the element to the target point (the station's point) can actually be outside of it, leading to zero distance. Therefore, two different elements (and as a result, two different slices) can have 0 as the distance between them and the stations, leading to (normally) the first of them to be chosen in locate_MPI_slice. Because of numerical accuracy, sometimes another slice will be chosen (the search for the closest point inside each slice is stopped when a point is found such that its square distance is 10^(-10), while in locate_MPI_slice it is just a straight comparison), so the behaviour is somewhat unexpected. I hope my understanding is correct. |
sounds possible, but this will add additional checks and affect all other simulation setups and speeds which have been fine so far. and it's still unclear to me why the neighbor slice finds a better position if they should be within an element of the one you assume. could attach the full output files, i.e., |
error_output.txt
You can see the first behaviour in the output_error file, where the coordinates of the "found" point in slice 4 are actually out of the slice (the y_found is -5999.3412906549866, while the maximum y for this slice is -6000). Furthermore, you can see in the file that due to the limit of 10^(-10) the distance found for slice 4 is 0, while for slice 9 (which is the slice where the point actually lies) is 9*10^(-13). Then, slice 4 is chosen, even though it is the wrong slice. |
thanks, you could try if the recent devel version with PR #1756 fixes your problem. |
It fixes the problem, thanks! |
Description
Some stations (or sources for adjoint simulations) are misallocated to the wrong core when they are close to the boundary. I printed some station values as an example, for core number 4 on the machine I run on. In theory, it should handle 6000<x<10000 and -10000<y<-6000.
station # 52982 located in slice 4 x: 5999.96 y: -6708.24
station # 53984 located in slice 9 x: 6708.13 y: -6000.09
station # 53985 located in slice 4 x: 6708.79 y: -5999.34
Overall, there are 3 stations that are misallocated here. One station that should be in slice 3 is allocated to slice 4, and another station that should be in slice 9 is allocated to slice 4. There is also one station that should be in slice 4 that is allocated to slice 9.
I encountered the problem when I tried to migrate my code to use SU seismograms (and adjoint sources), and as a result of the bug there is a mismatch between the number of seismograms in the .adj file and the number of receivers allocated in the corresponding slice. This causes the read error in the image.
Affected SPECFEM3D version
Latest Development Version
Your software and hardware environment
gcc & gfortran 9.3.1, intel mpi 2021.1, on CentOS 7
Reproduction steps
I believe that the problem can be produced by entering the following entries in STATIONS_ADJOINT: ST1 DS -6708.235405744787 5999.964811656147 0 0.0 ST2 DS -6000.086675107924 6708.126406918124 0 0.0 ST3 DS -5999.341290654987 6708.793041840081 0 0.0 With a Mesh_Par_file with the following parameters: LATITUDE_MIN = -10000 LATITUDE_MAX = 10000 LONGITUDE_MIN = -10000 LONGITUDE_MAX = 10000 DEPTH_BLOCK_KM = 20.d0 UTM_PROJECTION_ZONE = 36 SUPPRESS_UTM_PROJECTION = .true. NEX_XI = 80 NEX_ETA = 80 NPROC_XI = 5 NPROC_ETA = 5 And the number of processors, by extension being 25.
Screenshots
No response
Logs
OS
Linux
The text was updated successfully, but these errors were encountered: