Skip to content

HenryJunW/ESSumm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ESSumm: Extractive Speech Summarization from Untranscribed Meeting

The code base for ESSumm: Extractive Speech Summarization from Untranscribed Meeting
Jun Wang

NEWS

  • [22-06-15] 🔥 ESSumm is accepted at INTERSPEECH 2022.

Abstract

In this paper, we propose a novel architecture for direct extractive speech-to-speech summarization, ESSumm, which is an unsupervised model without dependence on intermediate transcribed text. Different from previous methods with text presentation, we are aimed at generating a summary directly from speech without transcription. First, a set of smaller speech segments are extracted based on speech signal's acoustic features. For each candidate speech segment, a distance-based summarization confidence score is designed for latent speech representation measure. Specifically, we leverage the off-the-shelf self-supervised convolutional neural network to extract the deep speech features from raw audio. Our approach automatically predicts the optimal sequence of speech segments that capture the key information with a target summary length. Extensive results on two well-known meeting datasets (AMI and ICSI corpora) show the effectiveness of our direct speech-based method to improve the summarization quality with untranscribed data. We also observe that our unsupervised speech-based method even performs on par with recent transcript-based summarization approaches, where extra speech recognition is required.


network

Features

  • The first automatic speech summarization system with Wav2vec 2.0.
  • Support major Meeting datasets: AMI and ICSI.

Installation

See installation instructions.

Getting Started

See Getting Started with ESSumm.

Citation

Please cite our work if you found it useful,

@inproceedings{wang22n_interspeech,
  author={Jun Wang},
  title={{ESSumm: Extractive Speech Summarization from Untranscribed Meeting}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3243--3247},
  doi={10.21437/Interspeech.2022-945}
}

License

This project is released under the Apache 2.0 license.

Acknowledgement

The source code of ESSumm is based on CoreRank and wav2vec 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published