pitch estimation comparison #275

BlueAmulet · 2023-04-09T20:48:18Z

BlueAmulet
Apr 9, 2023
Maintainer

Just came across this repository which I find interesting. It shows a model that is quite a bit lighter than CREPE, and also has several examples against CREPE and pYIN highlighting various failure cases. There is also another discussion #251 talking about f0 mean filtering to possibly improve the resulting quality further.

I might try to develop a tool that uses this fork to compare the result of various f0 algorithms over a spectrogram, similar to the examples in the hf0 repository.

BlueAmulet · 2023-04-09T22:23:29Z

BlueAmulet
Apr 9, 2023
Maintainer Author

Borrowing code from the following:
https://librosa.org/doc/main/generated/librosa.pyin.html
I was able to make the following figure

Surprising to me is how unstable harvest is. This is a sample from the slt voice from cmu arctic, but even on multiple datasets this behavior shows up. Other than that all the other available f0 methods seem pretty reasonable quality.

I did find this example from another dataset. Again harvest is all over the place, but both crepe methods also fail to follow the pitch down near the beginning of the audio clip. I've noticed this happening on a few other audio clips from my dataset as well.

My very limited to be taken with a grain of salt opinion of the f0 methods so far:
dio: sometimes completely misses various voiced sections
harvest: unstable
parselmouth: seems good, one odd glitch in a non voiced section in the first image
crepe-tiny: not worth the accuracy tradeoff
crepe: seems decent

dio and harvest: sometimes return an f0 around 60Hz on less than quality datasets.
both crepe: sometimes fails to follow pitch properly

1 reply

BlueAmulet Apr 10, 2023
Maintainer Author

Here is the tool that I came up with, if anyone else wants to investigate what the f0 methods do on their dataset:
f0_view.zip
Unpack the zip file and run it as python f0_view.py some_audio_clip.wav
The colors in the top right may be clicked on to hide the line from the spectrogram

Lordmau5 · 2023-04-10T13:08:49Z

Lordmau5
Apr 10, 2023
Maintainer

Would this also affect pre-hubert?

I know the default was changed from crepe to dio, but I'm not sure how different the results would be if a dataset is prepared with e.g. parselmouth.

1 reply

BlueAmulet Apr 10, 2023
Maintainer Author

This is all in the context of training, so pre-hubert is what this effects. I've been experimenting with the idea of a crepe biased nanmedian stacking (replace f0=0 with nan) with the results from pyin, dio, parselmouth, and crepe. Biased in the sense that if there's only data between one of the other methods and crepe, prefer crepe's data.

(I forgot to put the replace 0 with nan step and have wasted 9 hours on an overnight training test)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pitch estimation comparison #275

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

pitch estimation comparison #275

BlueAmulet Apr 9, 2023 Maintainer

Replies: 2 comments · 2 replies

BlueAmulet Apr 9, 2023 Maintainer Author

BlueAmulet Apr 10, 2023 Maintainer Author

Lordmau5 Apr 10, 2023 Maintainer

BlueAmulet Apr 10, 2023 Maintainer Author

BlueAmulet
Apr 9, 2023
Maintainer

Replies: 2 comments 2 replies

BlueAmulet
Apr 9, 2023
Maintainer Author

BlueAmulet Apr 10, 2023
Maintainer Author

Lordmau5
Apr 10, 2023
Maintainer

BlueAmulet Apr 10, 2023
Maintainer Author