An issue of inconsistent length in voice conversion. #4033

wzr0108 · 2024-10-22T02:46:56Z

wzr0108
Oct 22, 2024

When I perform voice conversion, I find that the converted audio is slightly shorter than the original. Since my audio is long, directly converting it would exceed the cuda memory, so I split the audio, convert it, and then merge it back. The more splits I make, the more serious the problem becomes. Upon checking the code, I found that during feature extraction by WavLM, conv1d doesn't use padding (is this typical in the speech field?), resulting in the feature length being shorter than the 320x downsampling, which in turn causes the generated audio to be shorter. Is there a way to ensure that the length of the converted audio remains unchanged?

eginhard · 2024-10-23T11:07:11Z

eginhard
Oct 23, 2024

Just to clarify, you mean voice conversion with the FreeVC model, right? Maybe check if that's also the case in the original FreeVC repo or if it has been discussed there? Coqui just integrated that code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An issue of inconsistent length in voice conversion. #4033

{{title}}

Replies: 1 comment

{{title}}

Select a reply

An issue of inconsistent length in voice conversion. #4033

wzr0108 Oct 22, 2024

Replies: 1 comment

eginhard Oct 23, 2024

wzr0108
Oct 22, 2024

eginhard
Oct 23, 2024