Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating samples from generated Mel-spectrograms #13

Closed
francislata opened this issue Nov 9, 2019 · 3 comments
Closed

Generating samples from generated Mel-spectrograms #13

francislata opened this issue Nov 9, 2019 · 3 comments

Comments

@francislata
Copy link

@bshall - First of all, thank you for this implementation. In this issue, you pointed out that you've generated a sample audio from generated Mel-spectrogram from VQVAE. It sounds pretty good.

My question is: how would one go about generating audio from Mel-spectrograms? Do we need to preprocess the Mel-spectrogram, if that's the only thing we're given?

@francislata francislata changed the title Generating samples generated Mel-spectrograms Generating samples from generated Mel-spectrograms Nov 9, 2019
@bshall
Copy link
Owner

bshall commented Nov 11, 2019

Hi @francislata,

So the generate.py script does generate audio from Mel-spectrograms (if you look at the code it converts the raw audio into a Mel-spectrogram and then feeds that to the vocoder). If you want to use spectrograms created from another process (like tacotron or something) they need to use the same parameters as I've used. You can find the parameters in config.json and the steps I used for preprocessing in preprocess.py.

@francislata
Copy link
Author

francislata commented Nov 12, 2019

@bshall - Can you be more specific which parameters that needs to match?

If the Mel-spectrogram given to me is generated by any TTS system, then can I just not take that and put it through the vocoder?

The generated audio by following the padding of the Mel-spectrogram in preprocess.py creates a silent audio throughout. So I'm wondering how you preprocessed the Mel-spectrogram you sampled here to make it produce the sound without having the reference waveform at all.

@bshall
Copy link
Owner

bshall commented Nov 14, 2019

Hi @francislata, sorry about the delay.

I used librosa to generate the Mel-spectrograms and the specific parameters hop_length, win_length, etc. can be found here. If you've got mels from a TTS system the best approach would be to retrain the Vocoder. To do that you should replace the steps in preprocess.py with the exact steps used for preprocessing the mels for the TTS system (but include the padding step).

Unfortuately different preprocessing does have a big effect so its very important that the preprocessing pipeline for the TTS system and the vocoder line up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants