Wav2vecDS是否可以做得更通用 #22

tailangjun · 2024-04-02T13:18:14Z

我发现 torchaudio.pipelines.HUBERT_ASR_LARGE输出的音频特征的维度是 (m, 29)， DeepSpeech v0.1输出的音频特征的维度是 (n, 29)，m和n相差不大，Wav2vecDS的作用是将 (m, 29)维度的特征映射到 (n, 29)维度。
我在想是不是可以把 Wav2vecDS做得更加通用一些，支持将任意维度的特征映射到 (n, 29)维度，类似于 ER-NeRF/nerf_triplane/network.py中的 AudioNet。这样就可以随便选用支持中文的模型来提取语音特征，比方说 chinese-wav2vec2-large和 chinese-hubert-large。

I found that the dimension of the audio feature output by torchaudio.pipelines.HUBERT_ASR_LARGE is (m, 29), and the dimension of the audio feature output by DeepSpeech v0.1 is (n, 29). m and n are not much different. The function of Wav2vecDS is to convert ( Features in m, 29) dimensions are mapped to (n, 29) dimensions.
I'm wondering if Wav2vecDS can be made more general and support mapping features of any dimension to (n, 29) dimensions, similar to AudioNet in ER-NeRF/nerf_triplane/network.py. In this way, you can choose any model that supports Chinese to extract speech features, such as chinese-wav2vec2-large and chinese-hubert-large.

Elsaam2y · 2024-04-17T07:37:49Z

Yes, thanks for the recommendation. I tried doing so mainly to support Chinese, however the mapping became more complex and the output features weren't always convincing as noticed from the output lip-sync.

Elsaam2y · 2024-04-17T07:38:21Z

But please feel free to open a PR if you worked on this and managed to get better results.

tailangjun · 2024-04-18T02:19:20Z

Yes, thanks for the recommendation. I tried doing so mainly to support Chinese, however the mapping became more complex and the output features weren't always convincing as noticed from the output lip-sync.

懂了，谢谢

PengYicong · 2024-10-11T08:39:04Z

I see that the mapping network Wav2VecDS is a simple MLP net. Maybe the mapping from complex features to this DS output needs a modern structure like transformers? I'm curious if you can provide some training information regarding the datasets and SyncNet configuration. I can try to train a network in my spare time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wav2vecDS是否可以做得更通用 #22

Wav2vecDS是否可以做得更通用 #22

tailangjun commented Apr 2, 2024

Elsaam2y commented Apr 17, 2024

Elsaam2y commented Apr 17, 2024

tailangjun commented Apr 18, 2024

PengYicong commented Oct 11, 2024

Wav2vecDS是否可以做得更通用 #22

Wav2vecDS是否可以做得更通用 #22

Comments

tailangjun commented Apr 2, 2024

Elsaam2y commented Apr 17, 2024

Elsaam2y commented Apr 17, 2024

tailangjun commented Apr 18, 2024

PengYicong commented Oct 11, 2024