-
Notifications
You must be signed in to change notification settings - Fork 1.5k
/
Copy pathphoneme.names
226 lines (157 loc) · 8.01 KB
/
phoneme.names
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
########################
# The phoneme database #
########################
1. Sources:
Dominique VAN CAPPEL (33) 92 96 45 44
THOMSON-SINTRA,
525 route des Dolines, BP157,
F-06903 Sophia Antipolis Cedex, France
This database was in use in the European ESPRIT 5516 project: ROARS.
The aim of this project is the development and the implementation of
a real time analytical system for French and Spannish speech recognition.
2. Past Usage:
Alinat, P.,
Periodic Progress Report 4,
ROARS Project ESPRIT II- Number 5516,
February 1993, Thomson report TS. ASM 93/S/EGS/NC/079
Guerin-Dugue, A. and others,
Deliverable R3-B4-P - Task B4: Benchmarks, Technical report,
Elena-NervesII "Enhanced Learning for Evolutive Neural Architecture",
ESPRIT-Basic Research Project Number 6891,
June 1995
Verleysen, M. and Voz, J.L. and Thissen, P. and Legat, J.D.,
A statistical Neural Network for high-dimensional vector classification,
ICNN'95 - IEEE International Conference on Neural Networks,
November 1995, Perth, Western Australia.
Voz J.L., Verleysen M., Thissen P. and Legat J.D.,
Suboptimal Bayesian classification by vector quantization with small clusters
ESANN95-European Symposium on Artificial Neural Networks,
April 1995, M. Verleysen editor, D facto publications, Brussels, Belgium.
Voz J.L., Verleysen M., Thissen P. and Legat J.D.,
A practical view of suboptimal Bayesian classification,
IWANN95-Proceedings of the International Workshop on Artificial Neural Networks,
June 1995, Mira, Cabestany, Prieto editors,
Springer-Verlag Lecture Notes in Computer Sciences, Malaga, Spain
3. Relevant information
Most of the already existing speech recognition systems are global
systems (typically Hidden Markov Models and Time Delay Neural Networks)
which recognises signals and do not really use the speech
specificities. On the contrary, analytical systems take into account
the articulatory process leading to the different phonemes of a given
language, the idea being to deduce the presence of each of the
phonetic features from the acoustic observation.
The main difficulty of analitycal systems is to obtain acoustical
parameters sufficiantly reliable. These acoustical measurements must :
- contain all the information relative to the concerned
phonetic feature.
- being speeker independant.
- being context independant.
- being more or less robust to noise.
The primary acoustical observation is always voluminous
(spectrum x N different observation moments) and classification cannot been
processed directly.
In ROARS, the initial database is provided by a cochlear spectra, which
may be seen as the output of a filters bank having a constant
DeltaF/F0, where the central frequencies are distributed on a
logarithmic scale (MEL type) to simulate the frequency answer of the
auditory nerves. The filters outputs are taken every 2 or 8 msec
(integration on 4 or 16 msec) depending on the type of phoneme
observated (stationnary or transitory).
The aim of the present database is to distinguish between nasal and
oral vowels. There are thus two different classes:
Class 0 : Nasals
Class 1 : Orals
This database contains vowels coming from 1809 isolated syllables (for
example: pa, ta, pan,...). Five different attributes were chosen to
characterize each vowel: they are the amplitudes of the five first
harmonics AHi, normalised by the total energy Ene (integrated on all the
frequencies): AHi/Ene. Each harmonic is signed: positive when it
corresponds to a local maximum of the spectrum and negative otherwise.
Three observation moments have been kept for each vowel to obtain
5427 different instances:
- the observation corresponding to the maximum total energy Ene.
- the observations taken 8 msec before and 8 msec after
the observation corresponding to this maximum total energy.
From these 5427 initial values, 23 instances for which the amplitude
of the 5 first harmonics was zero were removed, leading to the 5404
instances of the present database.
The patterns are presented in a random order.
4. Summary Statistics:
Attribute Min Max Mean Standard
deviation
1 -1.70 4.11 0.82 0.86
2 -1.33 4.38 1.26 0.85
3 -1.82 3.20 0.76 0.93
4 -1.58 2.83 0.40 0.80
5 -1.28 2.72 0.08 0.58
Class Distribution: number of instances per class
class 0 3818 - 70.65%
class 1 1586 - 29.35%
Correlation matrix:
{{ 1.00,-0.10,-0.32,-0.19,-0.05},
{-0.10, 1.00,-0.25,-0.21,-0.07},
{-0.32,-0.25, 1.00, 0.02, 0.01},
{-0.19,-0.21, 0.02, 1.00,-0.04},
{-0.05,-0.07, 0.01,-0.04, 1.00}}
Attributes maximum precision: 15 bits.
The database resulting from the centering and reduction by attribute of the phoneme
database is on the ftp server in the `REAL/phoneme/phoneme_CR.dat.Z' file.
5. Confusion matrix obtained with the k_NN classifier
(test with the Leave_One_Out method).
k was set to 1 to reach the minimum error rate : 8.97 +/- 1.1%)
{{0, 0 , 1 },
{0, 95.0, 5.0},
{1, 18.5, 81.5}}
With k set to 1, this result is a bit optimistic because of the fact
that the database is composed of the same phonemes taken at 3
differents moments. Setting k=20, will permit to avoid this influence
and will provide more realistic results:
{{0, 0 , 1 },
{0, 91.4, 8.6},
{1, 27.8, 72.2}}
In this case, the total error rate is: 14.2%.
7. Result of the Principal Component Analysis:
The Principal Components Analysis is a very classical method in pattern
recognition [Duda73].
PCA reduces the sample dimension in a linear way for the best
representation in lower dimensions keeping the maximum of inertia. The
best axe for the representation is however not necessary the best axe
for the discrimination. After PCA, features are selected according to
the percentage of initial inertia which is covered by the different
axes and the number of features is determined according to the
percentage of initial inertia to keep for the classification process.
This selection method has been applied on the phoneme_CR database.
When quasi-linear correlations exists between some initial features,
these redundant dimensions are removed by PCA and this preprocessing is
then recommended. In this case, before a PCA, the determinant of the
data covariance matrix is near zero; this database is thus badly
conditioned for all process which use this information (the quadratic
classifier for example).
The following files are available for the phoneme database:
- ``phoneme_PCA.dat.Z'', the projection of the ``phoneme_CR'' database on its
principal components (sorted in a decreasing order of the related
inertia percentage; so, if you desire to work on the database projected on
its x first principal components you only have to keep the x first attributes
of the phoneme_PCA.dat database and the class labels (last attribute)).
- ``phoneme_corr_circle.ps'', a graphical representation of the
correlation between the initial attributes and the two first
principal components,
- ``phoneme_proj_PCA.ps'', a graphical representation of the
projection of the initial database on the two first principal
components,
Table here below provides the inertia percentages associated to the
eigenvalues corresponding to the principal component axis sorted in
the decreasing order of their associated inertia percentage.
Eigen Value Inertia Cumulated
value percentage inertia
1 1.46471 29.30 29.30
2 1.10934 22.19 51.49
3 1.02830 20.57 72.06
4 0.94158 18.83 90.89
5 0.45516 9.10 100.00
It is thus clear that any dimensionality reduction based on PCA would
lead to an important loss of pertinent data
[Duda73]
Duda, R.O. and Hart, P.E.,
Pattern Classification and Scene Analysis,
John Wiley & Sons, 1973.