An issue of inconsistent length in voice conversion. #4033
Unanswered
wzr0108
asked this question in
General Q&A
Replies: 1 comment
-
Just to clarify, you mean voice conversion with the FreeVC model, right? Maybe check if that's also the case in the original FreeVC repo or if it has been discussed there? Coqui just integrated that code. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I perform voice conversion, I find that the converted audio is slightly shorter than the original. Since my audio is long, directly converting it would exceed the cuda memory, so I split the audio, convert it, and then merge it back. The more splits I make, the more serious the problem becomes. Upon checking the code, I found that during feature extraction by WavLM, conv1d doesn't use padding (is this typical in the speech field?), resulting in the feature length being shorter than the 320x downsampling, which in turn causes the generated audio to be shorter. Is there a way to ensure that the length of the converted audio remains unchanged?
Beta Was this translation helpful? Give feedback.
All reactions