STACK OVERFLOW 50 #32

mcmingchang · 2024-12-30T03:12:33Z

The code used compute_80 and sm_80 did not encounter any issues, but the above error occurred under default parameters.
The problem lies in calling the function：

template<typename CandidateList,
typename data_t,
typename data_traits>
inline device
float knn(CandidateList &result,
const SpatialKDTree<data_t,data_traits> &tree,
typename data_traits::point_t queryPoint)

mcmingchang · 2024-12-30T06:44:37Z

The reason is that the construction of the KD tree failed，tree.numPrims=10000，but tree.numNodes=2

ingowald · 2024-12-30T15:17:50Z

Leaving for vacation right now, can't look at that til weekend. Can u send me google drive link with file containing these 10k points to reproduce?

ingowald · 2025-01-06T19:26:08Z

Hey - any change i could get some more info and/or data to test with?
What's happening here is pretty clear - it cannot find any way of splitting your input points into two separate halves, and then creates a single huge leaf - but the question is why: I currently see these options:
a) your input data is 10k copies of the same identical point, in which case the bounding box collases to a single point, and the builder can't find any plane to split.
b) maybe your input data contains some coordinates that are messing up my code (NANs or INFs?) that are messing up the way i compute the bounding box,
or
c) something's wrong with how cuda treats the atomic min/max'es in the bounding box computation.

given that you explicitly mention "compute_80" vs "default parameters" i kind of assume it's the latter, but some more info would be very useful. Also, please provide the cmdline flags you used to compile.

ingowald · 2025-01-06T19:33:42Z

BTW: i'd have to double-check, but I think that starting with either compute_75 or compute_80 architectures you can no longer run code that was explicitly compiled for earlier archs; in particular, not if your code was explicitly compiled for compute_50, which unfortunately is what some older cmake archs will default to if arch isn't set.

mcmingchang · 2025-01-07T07:10:22Z

I found that the problems with low version compilation can only be solved using the new version.Then I found that there is a low probability of read-write conflicts in the code copied from the video memory.

CUDA call (cudaMemcpy(&tree.bounds, savedBounds, sizeof(tree.bounds), cudaMemcpyDefault)) failed with code 700 (line 360): an illegal memory access was encountered

The current version is based on CUDA12.6 and compiled as compute_80, sm_80.

mcmingchang · 2025-01-07T07:19:19Z

I found that the problems with low version compilation can only be solved using the new version.Then I found that there is a low probability of read-write conflicts in the code copied from the video memory.

CUDA call (cudaMemcpy(&tree.bounds, savedBounds, sizeof(tree.bounds), cudaMemcpyDefault)) failed with code 700 (line 360): an illegal memory access was encountered

The current version is based on CUDA12.6 and compiled as compute_80, sm_80.

How to change this to direct writing instead of copying

ingowald · 2025-01-07T15:39:33Z

Re arch version: yes, you have to use a valid cuda arch when compiling. The problem is that a) some newer GPUs cannot understand older archs, and some of the older archs cannot even do all the atomics I use; and b) cmake defaults to arch 5.2 (really old!) if you don't specify any. And since cudaKDTree is a "header-only" library, I cannot set the arch, the user has to in his or her own program. The only thing I can do (and I will do so right away) is add some checks to the cmakefile that checks if cuda archs are set, and not set to 52, and throw an error if not. Not perfect, but better than nothing.

As to the read-write conflict - i'll have to look at that first before commenting.

ingowald · 2025-01-07T15:55:32Z

Ugh; just seen that i have some debug-printfs in my spatial-kdtree builder - that probably means that i have a slightly different version than you that didn't get pushed, or is/was in a different branch. Apologies. On the upside, the code that I have does not have a read/write conflict: both the saveBounds() kernel and the memcpy live in the same stream 's', and there's explicit cudaStreamSynchronize()s in there, so this should be safe (unless I'm missing something).

To change this to a direct write we'd have to allocate pinned memory on the host, and have savebounds write to that - but we'd still need the streamsynchronize, so this wouldn't change or save anything.

ingowald · 2025-01-07T16:05:52Z

just pushed a cleaned-up version, and test-ran that on several tests, none of which caused any issues ...

mcmingchang · 2025-01-08T01:15:12Z

The reason why I use Spatial KDTREE is that it can preserve the constructed tree and create multiple different trees, which I can selectively call at different times. In cukd: buildTree(data,numData); I can't find how the tree is cache

mcmingchang · 2025-01-08T02:06:32Z

The parameter for thrust should be used to apply for video memory using CUDAMallocManaged. According to the official documentation, using CUDADeviceSynchronize allows the CPU to directly access the values inside. However, after my testing, it may cause memory access conflicts in certain low probability situations. Therefore, I replaced all thrust parameters in the project with Cub and accessed the values through CUDAMemcpyDeviceToHost. I found that simply looping through the demo program cannot detect problems, but there are many low probability issues that may occur after deployment in the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STACK OVERFLOW 50 #32

STACK OVERFLOW 50 #32

mcmingchang commented Dec 30, 2024 •

edited

Loading

mcmingchang commented Dec 30, 2024

ingowald commented Dec 30, 2024

ingowald commented Jan 6, 2025

ingowald commented Jan 6, 2025

mcmingchang commented Jan 7, 2025

mcmingchang commented Jan 7, 2025

ingowald commented Jan 7, 2025

ingowald commented Jan 7, 2025

ingowald commented Jan 7, 2025

mcmingchang commented Jan 8, 2025

mcmingchang commented Jan 8, 2025

STACK OVERFLOW 50 #32

STACK OVERFLOW 50 #32

Comments

mcmingchang commented Dec 30, 2024 • edited Loading

mcmingchang commented Dec 30, 2024

ingowald commented Dec 30, 2024

ingowald commented Jan 6, 2025

ingowald commented Jan 6, 2025

mcmingchang commented Jan 7, 2025

mcmingchang commented Jan 7, 2025

ingowald commented Jan 7, 2025

ingowald commented Jan 7, 2025

ingowald commented Jan 7, 2025

mcmingchang commented Jan 8, 2025

mcmingchang commented Jan 8, 2025

mcmingchang commented Dec 30, 2024 •

edited

Loading