-
Notifications
You must be signed in to change notification settings - Fork 27.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)
* Fix StopStringCriteria to handle tokens above len(tokenizer) This fixes #35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer). The fix: 1. Adds a clamp operation to ensure token IDs are within bounds 2. Adds a test case to verify the behavior * Use self.stop_strings instead of stop_strings * Handle clipping correctly * make fixup * Update test to the new embedding vecs * Use much bigger values in the mismatch test * Typo fix * Slight simplification --------- Co-authored-by: openhands <[email protected]>
- Loading branch information
1 parent
28f73bc
commit 4563ba2
Showing
2 changed files
with
27 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters