-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fill prompt for sampler analysis with real tokens in VLM pipeline #1247
fill prompt for sampler analysis with real tokens in VLM pipeline #1247
Conversation
sbalandi
commented
Nov 22, 2024
•
edited
Loading
edited
- add missed token, if prev generation was finished because max length was reached
std::fill_n(prompt_ids.data<int64_t>(), prompt_ids.get_size(), 0); | ||
|
||
auto chat_history = m_inputs_embedder->get_tokenized_chat_history(); | ||
size_t chat_history_size = std::max(chat_history.get_shape().at(1), history_size + inputs_embeds_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we have the same case as for LLMs, when decode ( encode ( X ) )
provides smaller value than X
?
in this case we need to partially re-compute the history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, I would consider merging VLM and LLM pipelines generate functions to keep all this magic with history in one place.
Or at least to create helper function similar to get_lm_encoded_results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to merge some part here after #1215
15fdc3c
to
2b26160
Compare
rebased on #1254 |
c6b1907
to
53cd2f7
Compare
22101ad
to
a8b866c
Compare
71a7cd2
to
8a8e513
Compare