This repository has been archived by the owner on Aug 23, 2023. It is now read-only.
forked from kaldi-asr/kaldi
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[src,script,egs] the gop_speechocean762 recipe (kaldi-asr#4441)
- Loading branch information
1 parent
6359c90
commit b9890a9
Showing
28 changed files
with
1,223 additions
and
234 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
In the `speechocean762` corpus, the phoneme-level scores are in three levels: | ||
2: pronunciation is correct | ||
1: pronunciation is right but has a heavy accent | ||
0: pronunciation is incorrect or missed | ||
|
||
Firstly, we can treat the scoring as a regression task. | ||
So, MSE(Mean Square Error) and Corr(Cross-correlation) are computed: | ||
|
||
MSE: 0.15 | ||
Corr: 0.42 | ||
|
||
Then we round the continuous predicted scores into [0, 1, 2] to treat the scoring | ||
as a classification task. | ||
So, the classification metrics like precision, recall, and f1-score are computed | ||
and printed by `sklearn.metrics.classification_report`: | ||
|
||
|
||
precision recall f1-score support | ||
|
||
0 0.46 0.17 0.25 1339 | ||
1 0.16 0.37 0.22 1828 | ||
2 0.96 0.93 0.95 44079 | ||
|
||
accuracy 0.89 47246 | ||
macro avg 0.53 0.49 0.47 47246 | ||
weighted avg 0.92 0.89 0.90 47246 |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# config for high-resolution MFCC features, intended for neural network training | ||
# Note: we keep all cepstra, so it has the same info as filterbank features, | ||
# but MFCC is more easily compressible (because less correlated) which is why | ||
# we prefer this method. | ||
--use-energy=false # use average of log energy, not energy. | ||
--num-mel-bins=40 # similar to Google's setup. | ||
--num-ceps=40 # there is no dimensionality reduction. | ||
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so | ||
# there might be some information at the low end. | ||
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Copyright 2015 Johns Hopkins University (Author: Jan Trmal <[email protected]>) | ||
# 2021 Xiaomi Corporation (Author: Junbo Zhang) | ||
# Apache 2.0 | ||
|
||
[ -f ./path.sh ] && . ./path.sh | ||
|
||
command -v python3 >&/dev/null \ | ||
|| { echo >&2 "python3 not found on PATH. You will have to install Python3, preferably >= 3.6"; exit 1; } | ||
|
||
for package in kaldi_io sklearn imblearn; do | ||
python3 -c "import ${package}" 2> /dev/null | ||
if [ $? -ne 0 ] ; then | ||
echo >&2 "This recipe needs the package ${package} installed. Exit." | ||
exit 1 | ||
fi | ||
done | ||
|
||
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Copyright 2020-2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang) | ||
# Apache 2.0 | ||
|
||
if [ "$#" -ne 2 ]; then | ||
echo "Usage: $0 <src-dir> <dst-dir>" | ||
echo "e.g.: $0 /home/storage07/zhangjunbo/data/speechocean762/test data/test" | ||
exit 1 | ||
fi | ||
|
||
src=$1 | ||
dst=$2 | ||
|
||
[ ! -d $src ] && echo "$0: no such directory $src" && exit 1; | ||
[ ! -d $src/../WAVE ] && echo "$0: no wav directory" && exit 1; | ||
|
||
wavedir=`realpath $src/../WAVE` | ||
|
||
[ -d $dst ] || mkdir -p $dst || exit 1; | ||
|
||
cp -Rf $src/* $dst/ || exit 1; | ||
|
||
sed -i.ori "s#WAVE#${wavedir}#" $dst/wav.scp || exit 1 | ||
|
||
utils/validate_data_dir.sh --no-feats $dst || exit 1; | ||
|
||
echo "$0: successfully prepared data in $dst" | ||
|
||
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Copyright 2014 Johns Hopkins University (author: Daniel Povey) | ||
# 2020-2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang) | ||
# Apache 2.0 | ||
|
||
set -e | ||
|
||
remove_archive=false | ||
if [ "$1" == --remove-archive ]; then | ||
remove_archive=true | ||
shift | ||
fi | ||
|
||
if [ $# -ne 2 ]; then | ||
echo "Usage: $0 [--remove-archive] <url-base> <data-base>" | ||
echo "e.g.: $0 www.openslr.org/resources/101 /home/storage07/zhangjunbo/data" | ||
echo "With --remove-archive it will remove the archive after successfully un-tarring it." | ||
exit 1 | ||
fi | ||
|
||
url=$1 | ||
data=$2 | ||
[ -d $data ] || mkdir -p $data | ||
|
||
corpus_name=speechocean762 | ||
|
||
if [ -z "$url" ]; then | ||
echo "$0: empty URL base." | ||
exit 1; | ||
fi | ||
|
||
if [ -f $data/$corpus_name/.complete ]; then | ||
echo "$0: data part $corpus_name was already successfully extracted, nothing to do." | ||
exit 0; | ||
fi | ||
|
||
# Check the archive file in bytes | ||
ref_size=520810923 | ||
if [ -f $data/$corpus_name.tar.gz ]; then | ||
size=$(/bin/ls -l $data/$corpus_name.tar.gz | awk '{print $5}') | ||
if [ $ref_size != $size ]; then | ||
echo "$0: removing existing file $data/$corpus_name.tar.gz because its size in bytes $size" | ||
echo "does not equal the size of one of the archives." | ||
rm $data/$corpus_name.tar.gz | ||
else | ||
echo "$data/$corpus_name.tar.gz exists and appears to be complete." | ||
fi | ||
fi | ||
|
||
# If you have permission to access Xiaomi's server, you would not need to | ||
# download it from OpenSLR | ||
path_on_mi_server=/home/storage06/wangyongqing/share/data/$corpus_name.tar.gz | ||
if [ -f $path_on_mi_server ]; then | ||
cp $path_on_mi_server $data/$corpus_name.tar.gz | ||
fi | ||
|
||
if [ ! -f $data/$corpus_name.tar.gz ]; then | ||
if ! which wget >/dev/null; then | ||
echo "$0: wget is not installed." | ||
exit 1; | ||
fi | ||
full_url=$url/$corpus_name.tar.gz | ||
|
||
echo "$0: downloading data from $full_url. This may take some time, please be patient." | ||
if ! wget -c --no-check-certificate $full_url -O $data/$corpus_name.tar.gz; then | ||
echo "$0: error executing wget $full_url" | ||
exit 1; | ||
fi | ||
fi | ||
|
||
cd $data | ||
if ! tar -xvzf $corpus_name.tar.gz; then | ||
echo "$0: error un-tarring archive $data/$corpus_name.tar.gz" | ||
exit 1; | ||
fi | ||
|
||
touch $corpus_name/.complete | ||
cd - | ||
|
||
echo "$0: Successfully downloaded and un-tarred $data/$corpus_name.tar.gz" | ||
|
||
if $remove_archive; then | ||
echo "$0: removing $data/$corpus_name.tar.gz file since --remove-archive option was supplied." | ||
rm $data/$corpus_name.tar.gz | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Copyright 2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang) | ||
# Apache 2.0 | ||
|
||
# This script does phone-level pronunciation scoring by GOP-based features. | ||
|
||
import sys | ||
import argparse | ||
import pickle | ||
import kaldi_io | ||
from utils import round_score | ||
|
||
|
||
def get_args(): | ||
parser = argparse.ArgumentParser( | ||
description='Phone-level scoring.', | ||
formatter_class=argparse.ArgumentDefaultsHelpFormatter) | ||
parser.add_argument('model', help='Input the model file') | ||
parser.add_argument('feature_scp', | ||
help='Input gop-based feature file, in Kaldi scp') | ||
parser.add_argument('output', help='Output the predicted file') | ||
sys.stderr.write(' '.join(sys.argv) + "\n") | ||
args = parser.parse_args() | ||
return args | ||
|
||
|
||
def main(): | ||
args = get_args() | ||
|
||
with open(args.model, 'rb') as f: | ||
model_of = pickle.load(f) | ||
|
||
with open(args.output, 'wt') as f: | ||
for ph_key, feat in kaldi_io.read_vec_flt_scp(args.feature_scp): | ||
ph = int(feat[0]) | ||
feat = feat[1:].reshape(1, -1) | ||
score = model_of[ph].predict(feat).reshape(1)[0] | ||
score = round_score(score, 1) | ||
f.write(f'{ph_key}\t{score:.1f}\t{ph}\n') | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Oops, something went wrong.