Can not recognize Silence, less stable than yamnet #6

Honghe · 2021-01-28T03:48:32Z

Hi qiuqiangkong, Thanks for your great job.
Recently, I tested panns_inference with the following wav audio.
silence.zip
The wav is a generated audio with "little noise, silence, little noise".

The paans_inference's output is as below. It can not recognize Silence, and the probability gap of Pink noise between the wav's head and tail is a little big.

In contrast, the yamnet's output is more ressonable as follow.

The panns_inference code I used was 013c0f6

Sincerely!

The text was updated successfully, but these errors were encountered:

qiuqiangkong · 2021-01-29T07:20:18Z

Hi Jack, Thank you very much for the feedback! I have tried the panns inference of the wav you attached and get the following result: [image: image.png] The panns_inference version is 0.0.7. Look the number of frames is 800 here. For this example, Yamnet performs better than PANNs in detecting silence. Here are two possible reasons: 1. Yamnet is trained on 1-second segments. While PANNs are trained on 10-second segments with weak labels to obtain better audio tagging performance. 2. PANNs applies mixup to improve the detection of other sound events, while mixup lower the performance for silence. It is very useful for us to know this feedback! We are very happy to know more comparision between Yamnet and PANNs if there are any! Best wishes, Qiuqiang

…

On Thu, 28 Jan 2021 at 11:48, Jack ***@***.***> wrote: Hi qiuqiangkong, Thanks for your great job. Recently, I tested panns_inference with the following wav audio. silence.zip <https://github.com/qiuqiangkong/panns_inference/files/5884423/silence.zip> The wav is a generated audio with "little noise, silence, little noise". [image: image] <https://user-images.githubusercontent.com/1092722/106086684-9c799a80-615d-11eb-95ca-efdf54903873.png> The paans_inference's output is as below. It can not recognize Silence, and the probability gap of Pink noise between the wav's head and tail is a little big. [image: image] <https://user-images.githubusercontent.com/1092722/106083000-b82d7280-6156-11eb-8109-62adff25b3d3.png> In contrast, the yamnet's output is more ressonable as follow. [image: image] <https://user-images.githubusercontent.com/1092722/106085778-cc27a300-615b-11eb-82ce-19d1b18c7185.png> The panns_inference code I used was 013c0f6 <013c0f6> Sincerely! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADFXTSPJZDV4PLBQVXB7MQ3S4DNB3ANCNFSM4WWLXHSQ> .

Honghe · 2021-01-29T07:29:33Z

@qiuqiangkong Thank you for your reply, but it seems your pic upload failed.

qiuqiangkong · 2021-02-02T03:49:18Z

@Honghe Sorry! See prediction figure attached:

Honghe changed the title ~~Can not recognize Silence, and it seems different to yamnet~~ Can not recognize Silence, less stable than yamnet Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not recognize Silence, less stable than yamnet #6

Can not recognize Silence, less stable than yamnet #6

Honghe commented Jan 28, 2021

qiuqiangkong commented Jan 29, 2021 via email

Honghe commented Jan 29, 2021

qiuqiangkong commented Feb 2, 2021

Can not recognize Silence, less stable than yamnet #6

Can not recognize Silence, less stable than yamnet #6

Comments

Honghe commented Jan 28, 2021

qiuqiangkong commented Jan 29, 2021 via email

Honghe commented Jan 29, 2021

qiuqiangkong commented Feb 2, 2021