You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off, I love this project. THANK YOU for your time here.
I'm seeing some instances of hallucinations (if that's even the right word for it here), even on simple text like "hi there" on the large data set (rev 9). It gives me 20 seconds of nightmare fuel sound. Slow whispered repeating words in low quality. If I run the same string in the mini language set it works, though the pronunciation could be better.
I have a large amount of text I want to break up into small bits and generate speech, to then later stitch back together. My plan is to try each bit with the large data set, then if it fails I will re-generate with the mini data set.
My question is how I can programmatically detect a failure? How can I test for hallucinations?
The text was updated successfully, but these errors were encountered:
In my limited experience so far, gabled audio that in unintelligible relates to padding scheme and max_new_tokens too low.
If you increase the max_new_tokens in you args config, then the generation works better, but slower.
generation = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
prompt_input_ids=prompt.input_ids,
min_new_tokens=10,
max_new_tokens=2580,
pad_token_id=1024,
do_sample=True,
temperature=0.8, # 1.0 more diverse, 0.0 more the same - smaller values take longer to gen
)
I've added max_new_tokens but I have no clue what max_new_tokens should be. I took this number from an example I found but really who knows.
I manually set the max length of the prompt text input by the length of prompt.input_ids[0]. 35 seems to be about where this thing starts to die.
After that, I do some audio tests on the output to see if it died but it seems like there should be a better way to tell how confident we are in the output.
Should padding be set to something other than True?
Also, sometimes the last few words are gone. I'm assuming this has something to do with the truncate setting (???) but I don't know how.
First off, I love this project. THANK YOU for your time here.
I'm seeing some instances of hallucinations (if that's even the right word for it here), even on simple text like "hi there" on the large data set (rev 9). It gives me 20 seconds of nightmare fuel sound. Slow whispered repeating words in low quality. If I run the same string in the mini language set it works, though the pronunciation could be better.
I have a large amount of text I want to break up into small bits and generate speech, to then later stitch back together. My plan is to try each bit with the large data set, then if it fails I will re-generate with the mini data set.
My question is how I can programmatically detect a failure? How can I test for hallucinations?
The text was updated successfully, but these errors were encountered: