Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the embedding layer instead of the one-hot audio vector? #20

Open
ivancarapinha opened this issue Nov 1, 2020 · 1 comment
Open

Comments

@ivancarapinha
Copy link

Hello,

In the original implementation of this model, the authors employed a one-hot audio vector of dimension 1024. Unfortunately, the authors did not detail much about this one-hot vector in the paper and did not explain its purpose in the model. Given that its dimension is 1024 = (2^10), and that authors use 10-bit audio samples, I assume this vector is related to the prediction of each bit in each audio sample. But that's just a guess.

So, I have two (actually three) questions:

  1. What is the purpose of the one-hot audio vector in the original implementation?
  2. Why did you replace the one-hot vector with an embedding layer? What changed in the model behavior with this replacement?

Thank you very much

@bshall
Copy link
Owner

bshall commented Nov 12, 2020

Hi @ivancarapinha,

Sorry about the delay!

Yeah, the paper is very vague about the model details. You're correct that the one-hot representation is related to the 10-bit audio. Basically they apply mu-law companding to the original 16-bit audio. Then you form a one-hot representation for each sample where the 1 is at the index given by the mu-law companding. This is then fed into the autoregressive part of the model.

I used an embedding layer just to make the model a bit more efficient. The first operation in a GRU is a matrix multiplication with the input. So using a one-hot input picks out a column of the matrix (basically what an embedding layer does). I just separated out the embedding operation and used a smaller dimension which hopefully sped things up training a little. It should work fine if you go with the original approach though.

Hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants