-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host Implementation of Histogram APIs #1974
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me. I had to go through it a few times to understand __construct_by_args
, but I think I get how it works now and why it's useful.
Just some minor comments from my side.
@@ -4289,6 +4289,86 @@ __pattern_shift_right(_Tag __tag, _ExecutionPolicy&& __exec, _BidirectionalItera | |||
return __res.base(); | |||
} | |||
|
|||
template <typename _ForwardIterator, typename _IdxHashFunc, typename _RandomAccessIterator, class _IsVector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why _IsVector
is a class when the rest are typename
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it seems the convention in this file is generally for everything to be class (though its not fully consistent). I'll adjust it to the norm.
include/oneapi/dpl/pstl/omp/util.h
Outdated
std::uint32_t __count = 0; | ||
std::uint32_t __j = 0; | ||
|
||
for (; __j < __thread_specific_storage.size() && __count <= __i; ++__j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::uint32_t __count = 0
could probably be moved here to the initialization expression of this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
for (; __first != __last; ++__first) | ||
{ | ||
std::int32_t __bin = __func.get_bin(*__first); | ||
if (__bin >= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can
get_bin
return negative value? - If yes, what is correct behavior for
__brick_histogram
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, -1
is returned when the input element does not fall within any histogram bin. The correct behavior is to do nothing and skip this input element.
Specification
"Input values that do not map to a defined bin are skipped silently."
I recently looked into expanding the bin helper interface to include a separate function to check bounds, and another to get the bin which assumes it is in bounds. I thought this might provide benefit by reducing the number of branches by 1, but I saw no performance benefit from this change for CPU or GPU. It is still something we could pursue in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation LGTM, but we should probably hold off on merging until the RFC is first merged.
I agree with ignoring clang-format for the single difference.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
namespace __detail | ||
{ | ||
|
||
template <typename _ValueType, typename... _Args> | ||
struct __enumerable_thread_local_storage | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though these class templates are now internal, the API description for them is still a part of the backend API - you should know what can be done with the object obtained from the make function. We will therefore need to document it somewhere in sufficient detail - perhaps in the RFC for a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this interface is described in the RFC to some extent already. I can make that more specific and explicit.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked through the changes since my last approval, and it LGTM as well.
Implementation of histogram APIs for host backends.
Implementations are provided for
serial
,tbb
, andopenMP
backends. We add a generic__thread_enumerable_storage
struct to add a generic thread local storage for our host backends. We use the new TLS (Thread local storage) withparallel_for
to implement histogram. Testing is also added, and some minor adjustments are made to cmake.Please see the RFC documentation / discussion here for more details.