-
What do you think? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Blue did some comparison tests over there. Personally, I avoid dio and harvest, since they're not as good as the other ones in my opinion. parselmouth seems to have a pretty solid speed/performance/quality mix from what I noticed, So all in all, a mix of crepe and parselmouth is probably a good solution (if you edit them together in an audio editing program later on) |
Beta Was this translation helpful? Give feedback.
-
I always test them all because on different models they all perform differently. DIO (Distributed Inline Filtering with Overlap) is an algorithm for fundamental frequency (F0) estimation in speech signals. It uses a two-step process: first, it applies a low-pass filter to the signal to extract the harmonic structure, and then it uses a peak-picking algorithm to estimate the F0. CREPE (Convolutional REctified Phase Expressions) is a deep learning-based pitch detection algorithm that uses a convolutional neural network (CNN) to extract pitch features from the audio signal. Harvest (Harmonic Product Spectrum) is an algorithm for pitch detection that works by computing the harmonic product spectrum of the audio signal, which is a spectral representation that emphasizes harmonic frequencies. Parselmouth is a Python library for Praat, which is a software tool commonly used in phonetics research. Parselmouth provides an interface for accessing Praat's functionality in Python code, including functions for analyzing and synthesizing speech signals, as well as extracting features like pitch, formants, and spectrograms. |
Beta Was this translation helpful? Give feedback.
#275
Blue did some comparison tests over there.
Personally, I avoid dio and harvest, since they're not as good as the other ones in my opinion.
parselmouth seems to have a pretty solid speed/performance/quality mix from what I noticed,
though it should be mentioned that it is struggling with certain parts.
So all in all, a mix of crepe and parselmouth is probably a good solution (if you edit them together in an audio editing program later on)