Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix issue when i build vocab from filters_vocab_gen_util.ipnb file? #16

Open
ITHealer opened this issue Oct 24, 2023 · 9 comments

Comments

@ITHealer
Copy link

https://github.com/usefulsensors/openai-whisper/blob/main/notebooks/filters_vocab_gen_util.ipynb

image

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/content/whisper/whisper/assets/gpt2'. Use repo_type argument if needed.

I did not change anything in your code!!

Please help me,
Thanks

@ITHealer ITHealer changed the title How to fix issue when i build filters_vocab_gen_util.ipnb? How to fix issue when i build vocab from filters_vocab_gen_util.ipnb file? Oct 24, 2023
@nyadla-sys
Copy link
Owner

nyadla-sys commented Oct 25, 2023

Use the below colab to generate vocab.bin and i have change dthe magic and please follow
https://colab.research.google.com/github/nyadla-sys/whisper.tflite/blob/main/models/tflt_vocab_mel.ipynb

please refer
https://github.com/nyadla-sys/whisper.tflite/tree/main/models

@ITHealer
Copy link
Author

Use the below colab to generate vocab.bin and i have change dthe magic and please follow https://colab.research.google.com/github/nyadla-sys/whisper.tflite/blob/main/models/tflt_vocab_mel.ipynb

please refer https://github.com/nyadla-sys/whisper.tflite/tree/main/models

I was able to run...
Thank you very much!

@nyadla-sys
Copy link
Owner

//tfltchange in minimal
if (magic != 0x74666C74) {
printf("Invalid vocab file (bad magic)\n");
return 0;
}

@ITHealer
Copy link
Author

//tfltchange in minimal if (magic != 0x74666C74) { printf("Invalid vocab file (bad magic)\n"); return 0; }

yes i noticed that and i fixed it.
Thanks!

@ITHealer
Copy link
Author

//tfltchange in minimal if (magic != 0x74666C74) { printf("Invalid vocab file (bad magic)\n"); return 0; }

0x74666C74

One thing I don't know is if I use another model will the value "0x74666C74" have to change or not.
What is it and how do I identify it?

@nyadla-sys
Copy link
Owner

You can comment out this code it is just kind of authentication step to make sure you are using our vocab for this

@ITHealer
Copy link
Author

ITHealer commented Oct 25, 2023

Excuse me, if I want to build in another language, for example Vietnamses, I need to provide the word vocabulary and mel spectrogram of the dataset that I bring to train or I can use the vocab set and mel is also the language. but not from my data set, okay?

Is it true that for each different voice and frequency, each vocabulary will be mapped differently?

I'm new to AI so there are some things I'm not sure I'm stating correctly.

@ITHealer
Copy link
Author

Because currently I am only provided with a model that has been finetuned in Vietnamese and I need to create a bin file containing the vocab and mel files like you did. Can you guide me on what to keep in mind to create it?

Thanks!

@nyadla-sys
Copy link
Owner

You need to generate multilingual vocab file based on fine tuned pytorch model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants