-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update access modifiers for CPU impls #28286
base: master
Are you sure you want to change the base?
Update access modifiers for CPU impls #28286
Conversation
@@ -56,7 +56,7 @@ struct reorder_impl : public typed_primitive_impl<reorder> { | |||
auto output_mem_ptr = instance.output_memory_ptr(); | |||
|
|||
cldnn::mem_lock<uint8_t, mem_lock_type::read> input_lock(input_mem_ptr, stream); | |||
cldnn::mem_lock<uint8_t, mem_lock_type::read> output_lock(output_mem_ptr, stream); | |||
cldnn::mem_lock<uint8_t, mem_lock_type::read_write> output_lock(output_mem_ptr, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output memory will be updated as the reason solved by this PR, could you descript the detail reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who allocate cl_mem output for this node? and why cpu impl is choosen?
btw, this line will break if output is usm_device, it's unsafe to directly change to read_write here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, River, I will add some test case to explain this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usm_device
If it is usm_device, it will do copying operation by mem_lock itself, right? But we need logic to write back to usm_device memory if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeldonWangwang, could you please update access modifiers for the other CPU impls as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sshlyapn Sure, will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the memory is used for readonly, why change them to read_write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riverlijunjie, why are they used for read-only? These buffers are used to store CPU implementations output results, so they should be modifiable
Co-authored-by: River Li <[email protected]>
@@ -39,6 +39,9 @@ struct gpu_buffer : public lockable_gpu_mem, public memory { | |||
assert(0 == _lock_count); | |||
return _buffer; | |||
} | |||
void* buffer_ptr() const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sergey, it's not needed, this API just used in debug messages here:
GPU_DEBUG_TRACE_DETAIL << internal_name << " with index " << output_idx << " prepare output: " << output_memory->buffer_ptr() << std::endl; |
The ptr would be 0 without this API, so I added it.
such as:
GPU_Debug: sync_infer_request.cpp:989:prepare_output: result:Result_51154 with index 9 prepare output: 0
@@ -56,7 +56,7 @@ struct reorder_impl : public typed_primitive_impl<reorder> { | |||
auto output_mem_ptr = instance.output_memory_ptr(); | |||
|
|||
cldnn::mem_lock<uint8_t, mem_lock_type::read> input_lock(input_mem_ptr, stream); | |||
cldnn::mem_lock<uint8_t, mem_lock_type::read> output_lock(output_mem_ptr, stream); | |||
cldnn::mem_lock<uint8_t, mem_lock_type::read_write> output_lock(output_mem_ptr, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeldonWangwang, could you please update access modifiers for the other CPU impls as well?
Details:
When use HETERO:GPU.0,GPU.1 pipeline parallel split the llama_v2 model, there is a
__module.model.layers.0.self_attn/prim::ListConstruct_3
node is marked in shape_of subgraph, and it's result will pass to next device, thus this node and it's result node can only get CPU implementations.In inference stage, when prepare outputs for the Result_44438, B580 dGPU will only use cl_mem in
openvino/src/plugins/intel_gpu/src/plugin/sync_infer_request.cpp
Line 548 in d757efd
So, we have two solutions:
Tickets: