Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STACK OVERFLOW 50 #32

Open
mcmingchang opened this issue Dec 30, 2024 · 11 comments
Open

STACK OVERFLOW 50 #32

mcmingchang opened this issue Dec 30, 2024 · 11 comments

Comments

@mcmingchang
Copy link

mcmingchang commented Dec 30, 2024

The code used compute_80 and sm_80 did not encounter any issues, but the above error occurred under default parameters.
The problem lies in calling the function:

template<typename CandidateList,
typename data_t,
typename data_traits>
inline device
float knn(CandidateList &result,
const SpatialKDTree<data_t,data_traits> &tree,
typename data_traits::point_t queryPoint)

@mcmingchang
Copy link
Author

The reason is that the construction of the KD tree failed,tree.numPrims=10000,but tree.numNodes=2

@ingowald
Copy link
Owner

Leaving for vacation right now, can't look at that til weekend. Can u send me google drive link with file containing these 10k points to reproduce?

@ingowald
Copy link
Owner

ingowald commented Jan 6, 2025

Hey - any change i could get some more info and/or data to test with?
What's happening here is pretty clear - it cannot find any way of splitting your input points into two separate halves, and then creates a single huge leaf - but the question is why: I currently see these options:
a) your input data is 10k copies of the same identical point, in which case the bounding box collases to a single point, and the builder can't find any plane to split.
b) maybe your input data contains some coordinates that are messing up my code (NANs or INFs?) that are messing up the way i compute the bounding box,
or
c) something's wrong with how cuda treats the atomic min/max'es in the bounding box computation.

given that you explicitly mention "compute_80" vs "default parameters" i kind of assume it's the latter, but some more info would be very useful. Also, please provide the cmdline flags you used to compile.

@ingowald
Copy link
Owner

ingowald commented Jan 6, 2025

BTW: i'd have to double-check, but I think that starting with either compute_75 or compute_80 architectures you can no longer run code that was explicitly compiled for earlier archs; in particular, not if your code was explicitly compiled for compute_50, which unfortunately is what some older cmake archs will default to if arch isn't set.

@mcmingchang
Copy link
Author

I found that the problems with low version compilation can only be solved using the new version.Then I found that there is a low probability of read-write conflicts in the code copied from the video memory.

CUDA call (cudaMemcpy(&tree.bounds, savedBounds, sizeof(tree.bounds), cudaMemcpyDefault)) failed with code 700 (line 360): an illegal memory access was encountered

The current version is based on CUDA12.6 and compiled as compute_80, sm_80.

@mcmingchang
Copy link
Author

I found that the problems with low version compilation can only be solved using the new version.Then I found that there is a low probability of read-write conflicts in the code copied from the video memory.

CUDA call (cudaMemcpy(&tree.bounds, savedBounds, sizeof(tree.bounds), cudaMemcpyDefault)) failed with code 700 (line 360): an illegal memory access was encountered

The current version is based on CUDA12.6 and compiled as compute_80, sm_80.

How to change this to direct writing instead of copying

@ingowald
Copy link
Owner

ingowald commented Jan 7, 2025

Re arch version: yes, you have to use a valid cuda arch when compiling. The problem is that a) some newer GPUs cannot understand older archs, and some of the older archs cannot even do all the atomics I use; and b) cmake defaults to arch 5.2 (really old!) if you don't specify any. And since cudaKDTree is a "header-only" library, I cannot set the arch, the user has to in his or her own program. The only thing I can do (and I will do so right away) is add some checks to the cmakefile that checks if cuda archs are set, and not set to 52, and throw an error if not. Not perfect, but better than nothing.

As to the read-write conflict - i'll have to look at that first before commenting.

@ingowald
Copy link
Owner

ingowald commented Jan 7, 2025

Ugh; just seen that i have some debug-printfs in my spatial-kdtree builder - that probably means that i have a slightly different version than you that didn't get pushed, or is/was in a different branch. Apologies. On the upside, the code that I have does not have a read/write conflict: both the saveBounds() kernel and the memcpy live in the same stream 's', and there's explicit cudaStreamSynchronize()s in there, so this should be safe (unless I'm missing something).

To change this to a direct write we'd have to allocate pinned memory on the host, and have savebounds write to that - but we'd still need the streamsynchronize, so this wouldn't change or save anything.

@ingowald
Copy link
Owner

ingowald commented Jan 7, 2025

just pushed a cleaned-up version, and test-ran that on several tests, none of which caused any issues ...

@mcmingchang
Copy link
Author

The reason why I use Spatial KDTREE is that it can preserve the constructed tree and create multiple different trees, which I can selectively call at different times. In cukd: buildTree(data,numData); I can't find how the tree is cache

@mcmingchang
Copy link
Author

The parameter for thrust should be used to apply for video memory using CUDAMallocManaged. According to the official documentation, using CUDADeviceSynchronize allows the CPU to directly access the values inside. However, after my testing, it may cause memory access conflicts in certain low probability situations. Therefore, I replaced all thrust parameters in the project with Cub and accessed the values through CUDAMemcpyDeviceToHost. I found that simply looping through the demo program cannot detect problems, but there are many low probability issues that may occur after deployment in the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants