-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wav2vecDS是否可以做得更通用 #22
Comments
Yes, thanks for the recommendation. I tried doing so mainly to support Chinese, however the mapping became more complex and the output features weren't always convincing as noticed from the output lip-sync. |
But please feel free to open a PR if you worked on this and managed to get better results. |
懂了,谢谢 |
I see that the mapping network Wav2VecDS is a simple MLP net. Maybe the mapping from complex features to this DS output needs a modern structure like transformers? I'm curious if you can provide some training information regarding the datasets and SyncNet configuration. I can try to train a network in my spare time. |
我发现 torchaudio.pipelines.HUBERT_ASR_LARGE输出的音频特征的维度是 (m, 29), DeepSpeech v0.1输出的音频特征的维度是 (n, 29),m和n相差不大,Wav2vecDS的作用是将 (m, 29)维度的特征映射到 (n, 29)维度。
我在想是不是可以把 Wav2vecDS做得更加通用一些,支持将任意维度的特征映射到 (n, 29)维度,类似于 ER-NeRF/nerf_triplane/network.py中的 AudioNet。这样就可以随便选用支持中文的模型来提取语音特征,比方说 chinese-wav2vec2-large和 chinese-hubert-large。
I found that the dimension of the audio feature output by torchaudio.pipelines.HUBERT_ASR_LARGE is (m, 29), and the dimension of the audio feature output by DeepSpeech v0.1 is (n, 29). m and n are not much different. The function of Wav2vecDS is to convert ( Features in m, 29) dimensions are mapped to (n, 29) dimensions.
I'm wondering if Wav2vecDS can be made more general and support mapping features of any dimension to (n, 29) dimensions, similar to AudioNet in ER-NeRF/nerf_triplane/network.py. In this way, you can choose any model that supports Chinese to extract speech features, such as chinese-wav2vec2-large and chinese-hubert-large.
The text was updated successfully, but these errors were encountered: