-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STACK OVERFLOW 50 #32
Comments
The reason is that the construction of the KD tree failed,tree.numPrims=10000,but tree.numNodes=2 |
Leaving for vacation right now, can't look at that til weekend. Can u send me google drive link with file containing these 10k points to reproduce? |
Hey - any change i could get some more info and/or data to test with? given that you explicitly mention "compute_80" vs "default parameters" i kind of assume it's the latter, but some more info would be very useful. Also, please provide the cmdline flags you used to compile. |
BTW: i'd have to double-check, but I think that starting with either compute_75 or compute_80 architectures you can no longer run code that was explicitly compiled for earlier archs; in particular, not if your code was explicitly compiled for compute_50, which unfortunately is what some older cmake archs will default to if arch isn't set. |
I found that the problems with low version compilation can only be solved using the new version.Then I found that there is a low probability of read-write conflicts in the code copied from the video memory. CUDA call (cudaMemcpy(&tree.bounds, savedBounds, sizeof(tree.bounds), cudaMemcpyDefault)) failed with code 700 (line 360): an illegal memory access was encountered The current version is based on CUDA12.6 and compiled as compute_80, sm_80. |
How to change this to direct writing instead of copying |
Re arch version: yes, you have to use a valid cuda arch when compiling. The problem is that a) some newer GPUs cannot understand older archs, and some of the older archs cannot even do all the atomics I use; and b) cmake defaults to arch 5.2 (really old!) if you don't specify any. And since cudaKDTree is a "header-only" library, I cannot set the arch, the user has to in his or her own program. The only thing I can do (and I will do so right away) is add some checks to the cmakefile that checks if cuda archs are set, and not set to 52, and throw an error if not. Not perfect, but better than nothing. As to the read-write conflict - i'll have to look at that first before commenting. |
Ugh; just seen that i have some debug-printfs in my spatial-kdtree builder - that probably means that i have a slightly different version than you that didn't get pushed, or is/was in a different branch. Apologies. On the upside, the code that I have does not have a read/write conflict: both the To change this to a direct write we'd have to allocate pinned memory on the host, and have savebounds write to that - but we'd still need the streamsynchronize, so this wouldn't change or save anything. |
just pushed a cleaned-up version, and test-ran that on several tests, none of which caused any issues ... |
The reason why I use Spatial KDTREE is that it can preserve the constructed tree and create multiple different trees, which I can selectively call at different times. In cukd: buildTree(data,numData); I can't find how the tree is cache |
The parameter for thrust should be used to apply for video memory using CUDAMallocManaged. According to the official documentation, using CUDADeviceSynchronize allows the CPU to directly access the values inside. However, after my testing, it may cause memory access conflicts in certain low probability situations. Therefore, I replaced all thrust parameters in the project with Cub and accessed the values through CUDAMemcpyDeviceToHost. I found that simply looping through the demo program cannot detect problems, but there are many low probability issues that may occur after deployment in the project. |
The code used compute_80 and sm_80 did not encounter any issues, but the above error occurred under default parameters.
The problem lies in calling the function:
template<typename CandidateList,
typename data_t,
typename data_traits>
inline device
float knn(CandidateList &result,
const SpatialKDTree<data_t,data_traits> &tree,
typename data_traits::point_t queryPoint)
The text was updated successfully, but these errors were encountered: