Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Commit

Permalink
[src,script,egs] the gop_speechocean762 recipe (kaldi-asr#4441)
Browse files Browse the repository at this point in the history
  • Loading branch information
jimbozhang authored Feb 4, 2021
1 parent 6359c90 commit b9890a9
Show file tree
Hide file tree
Showing 28 changed files with 1,223 additions and 234 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,15 @@ GSYMS
# Python compiled bytecode files.
*.pyc

# Python virtual environment
venv/

# Make dependencies.
.depend.mk

# Some weird thing that macOS creates.
*.dSYM
.DS_Store

# Windows executable, symbol and some weird files.
*.exe
Expand All @@ -61,6 +65,7 @@ GSYMS
*.manifest
/kaldiwin_vs*
.vscode
.idea

# /src/
/src/.short_version
Expand Down
12 changes: 0 additions & 12 deletions egs/gop/s5/local/make_testcase.sh

This file was deleted.

102 changes: 0 additions & 102 deletions egs/gop/s5/run.sh

This file was deleted.

17 changes: 16 additions & 1 deletion egs/gop/README.md → egs/gop_speechocean762/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,5 +94,20 @@ We guess the HMM topo of chain model may not fit for GOP.

The nnet3's TDNN (no chain) model performs well in GOP computing, so this recipe uses it.

## The `speechocean762` corpus

This corpus aims to provide a free public dataset for the pronunciation scoring task.

This corpus consists 5000 English sentences.
All the speakers are non-native and their mother tongue is Mandarin.
Half of the speakers are Children and the others are adults.
The information of age and gender are provided.

The scores was made by five experts. To avoid subjectively bias, each experts scores independently under the same metric.
The experts score at three levels: phoneme-level, word-level and sentence-level.

In this recipe, the automatic phoneme-level scoring is illustrated.

## Acknowledgement
The author of this recipe would like to thank Xingyu Na for his works of model tuning and his helpful suggestions.
The author of this recipe would like to thank Speechocean for providing the corpus,
and Xingyu Na for his works of model tuning and his helpful suggestions.
26 changes: 26 additions & 0 deletions egs/gop_speechocean762/s5/RESULT
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
In the `speechocean762` corpus, the phoneme-level scores are in three levels:
2: pronunciation is correct
1: pronunciation is right but has a heavy accent
0: pronunciation is incorrect or missed

Firstly, we can treat the scoring as a regression task.
So, MSE(Mean Square Error) and Corr(Cross-correlation) are computed:

MSE: 0.15
Corr: 0.42

Then we round the continuous predicted scores into [0, 1, 2] to treat the scoring
as a classification task.
So, the classification metrics like precision, recall, and f1-score are computed
and printed by `sklearn.metrics.classification_report`:


precision recall f1-score support

0 0.46 0.17 0.25 1339
1 0.16 0.37 0.22 1828
2 0.96 0.93 0.95 44079

accuracy 0.89 47246
macro avg 0.53 0.49 0.47 47246
weighted avg 0.92 0.89 0.90 47246
File renamed without changes.
10 changes: 10 additions & 0 deletions egs/gop_speechocean762/s5/conf/mfcc_hires.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# config for high-resolution MFCC features, intended for neural network training
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false # use average of log energy, not energy.
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
# there might be some information at the low end.
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)
20 changes: 20 additions & 0 deletions egs/gop_speechocean762/s5/local/check_dependencies.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash

# Copyright 2015 Johns Hopkins University (Author: Jan Trmal <[email protected]>)
# 2021 Xiaomi Corporation (Author: Junbo Zhang)
# Apache 2.0

[ -f ./path.sh ] && . ./path.sh

command -v python3 >&/dev/null \
|| { echo >&2 "python3 not found on PATH. You will have to install Python3, preferably >= 3.6"; exit 1; }

for package in kaldi_io sklearn imblearn; do
python3 -c "import ${package}" 2> /dev/null
if [ $? -ne 0 ] ; then
echo >&2 "This recipe needs the package ${package} installed. Exit."
exit 1
fi
done

exit 0
30 changes: 30 additions & 0 deletions egs/gop_speechocean762/s5/local/data_prep.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash

# Copyright 2020-2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
# Apache 2.0

if [ "$#" -ne 2 ]; then
echo "Usage: $0 <src-dir> <dst-dir>"
echo "e.g.: $0 /home/storage07/zhangjunbo/data/speechocean762/test data/test"
exit 1
fi

src=$1
dst=$2

[ ! -d $src ] && echo "$0: no such directory $src" && exit 1;
[ ! -d $src/../WAVE ] && echo "$0: no wav directory" && exit 1;

wavedir=`realpath $src/../WAVE`

[ -d $dst ] || mkdir -p $dst || exit 1;

cp -Rf $src/* $dst/ || exit 1;

sed -i.ori "s#WAVE#${wavedir}#" $dst/wav.scp || exit 1

utils/validate_data_dir.sh --no-feats $dst || exit 1;

echo "$0: successfully prepared data in $dst"

exit 0
86 changes: 86 additions & 0 deletions egs/gop_speechocean762/s5/local/download_and_untar.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/usr/bin/env bash

# Copyright 2014 Johns Hopkins University (author: Daniel Povey)
# 2020-2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
# Apache 2.0

set -e

remove_archive=false
if [ "$1" == --remove-archive ]; then
remove_archive=true
shift
fi

if [ $# -ne 2 ]; then
echo "Usage: $0 [--remove-archive] <url-base> <data-base>"
echo "e.g.: $0 www.openslr.org/resources/101 /home/storage07/zhangjunbo/data"
echo "With --remove-archive it will remove the archive after successfully un-tarring it."
exit 1
fi

url=$1
data=$2
[ -d $data ] || mkdir -p $data

corpus_name=speechocean762

if [ -z "$url" ]; then
echo "$0: empty URL base."
exit 1;
fi

if [ -f $data/$corpus_name/.complete ]; then
echo "$0: data part $corpus_name was already successfully extracted, nothing to do."
exit 0;
fi

# Check the archive file in bytes
ref_size=520810923
if [ -f $data/$corpus_name.tar.gz ]; then
size=$(/bin/ls -l $data/$corpus_name.tar.gz | awk '{print $5}')
if [ $ref_size != $size ]; then
echo "$0: removing existing file $data/$corpus_name.tar.gz because its size in bytes $size"
echo "does not equal the size of one of the archives."
rm $data/$corpus_name.tar.gz
else
echo "$data/$corpus_name.tar.gz exists and appears to be complete."
fi
fi

# If you have permission to access Xiaomi's server, you would not need to
# download it from OpenSLR
path_on_mi_server=/home/storage06/wangyongqing/share/data/$corpus_name.tar.gz
if [ -f $path_on_mi_server ]; then
cp $path_on_mi_server $data/$corpus_name.tar.gz
fi

if [ ! -f $data/$corpus_name.tar.gz ]; then
if ! which wget >/dev/null; then
echo "$0: wget is not installed."
exit 1;
fi
full_url=$url/$corpus_name.tar.gz

echo "$0: downloading data from $full_url. This may take some time, please be patient."
if ! wget -c --no-check-certificate $full_url -O $data/$corpus_name.tar.gz; then
echo "$0: error executing wget $full_url"
exit 1;
fi
fi

cd $data
if ! tar -xvzf $corpus_name.tar.gz; then
echo "$0: error un-tarring archive $data/$corpus_name.tar.gz"
exit 1;
fi

touch $corpus_name/.complete
cd -

echo "$0: Successfully downloaded and un-tarred $data/$corpus_name.tar.gz"

if $remove_archive; then
echo "$0: removing $data/$corpus_name.tar.gz file since --remove-archive option was supplied."
rm $data/$corpus_name.tar.gz
fi
42 changes: 42 additions & 0 deletions egs/gop_speechocean762/s5/local/feat_to_score_eval.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright 2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
# Apache 2.0

# This script does phone-level pronunciation scoring by GOP-based features.

import sys
import argparse
import pickle
import kaldi_io
from utils import round_score


def get_args():
parser = argparse.ArgumentParser(
description='Phone-level scoring.',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('model', help='Input the model file')
parser.add_argument('feature_scp',
help='Input gop-based feature file, in Kaldi scp')
parser.add_argument('output', help='Output the predicted file')
sys.stderr.write(' '.join(sys.argv) + "\n")
args = parser.parse_args()
return args


def main():
args = get_args()

with open(args.model, 'rb') as f:
model_of = pickle.load(f)

with open(args.output, 'wt') as f:
for ph_key, feat in kaldi_io.read_vec_flt_scp(args.feature_scp):
ph = int(feat[0])
feat = feat[1:].reshape(1, -1)
score = model_of[ph].predict(feat).reshape(1)[0]
score = round_score(score, 1)
f.write(f'{ph_key}\t{score:.1f}\t{ph}\n')


if __name__ == "__main__":
main()
Loading

0 comments on commit b9890a9

Please sign in to comment.