Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Snippets][CPU] Disable MHA tokenization in LLM (#28601)
### Details: - *The second inference in LLM is usually single token inference. It means that `M` dimension of MatMuls in SDPA pattern will have the value `1` (during compilation model this dimension is dynamic (unknown)). Snippets cannot provide efficient execution for single token inference. So we decided to disable MHA tokenization by Snippets in CPU Plugin on LLMs'. We consider the presence of `ScaledDotProductAttentionWithKVCache` op in the model as a sign that this model is LLM.* ### Tickets: - *160634* - *160978* ### TODO: - [x] Performance validation on LLMs (the results are in the ticket CVS-160978)
- Loading branch information