Skip to content

Kyumin-Park/aihub_multimodal_speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI Hub Dialogue Speech

Speech Dataset generator from AI Hub multimodal video dataset

Dataset Download

Request dataset from AI Hub Multimodal Video.

Convert to Speech Dataset

  1. Place video dataset with following structure:
data
└── 0001-0400
    ├── clip_1
    │   ├── clip_1.json
    │   └── clip_1.mp4
    └── clip_2
        ├── clip_2.json
        └── clip_2.mp4
  1. Install requirements

Requirements:

  • moviepy==1.0.3
  • librosa==0.8.0
  1. Create audio file
python create_audio.py [--convert_video] [--sample_rate SR]

Option:

  • convert_video: convert mp4 video into wav form first
  • sample_rate: sampling rate (default: 22050)
  1. Split train/dev/test set
python split.py [--path FILELIST_PATH] [--ratio RATIO] [--seed SEED]

Option:

  • path: path of filelist. train/dev/test filelists are created in same directory.
  • ratio: train_ratio:dev_ratio:test_ratio. Three ratios must be splitted with ':'
  • seed: random seed for shuffling

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages