Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Snippets][CPU][Port to 2025.0] Disable MHA tokenization in LLM (#28611)
### Details: - *The second inference in LLM is usually single token inference. It means that `M` dimension of MatMuls in SDPA pattern will have the value `1` (during compilation model this dimension is dynamic (unknown)). Snippets cannot provide efficient execution for single token inference. So we decided to disable MHA tokenization by Snippets in CPU Plugin on LLMs'. We consider the presence of `ScaledDotProductAttentionWithKVCache` op in the model as a sign that this model is LLM.* - *Cherry-picked from #28601 ### Tickets: - *160634* - *160978 (contains performance validation results)*
- Loading branch information