-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have some way to model stress in Japanese MFA #810
Comments
So it's a bit tricky to really do this in Japanese, since it's a pitch-accent language. So the difference in the two words is which syllable has higher versus lower pitch, but each word will have the same loudness/length for the two syllables (taking into account things like speaker, speech rate, word frequency, focus, etc etc). In a stress language like English, stress has large effects on vowel quality, with vowels that only appear in stressed syllables and vowels that only appear in unstressed syllables, so word pairs like "proJECT" (i.e. to project an image on a screen) vs "PROJect" (i.e. a project that you work on), are going to have different vowels in the first syllable in addition to the syllable level differences in length, loudness, and pitch. With that said, for the MFA models in particular, there's no pitch/voicing features used currently in acoustic models, though there is use of devoiced vowels as you've mentioned. Those are generated through the phonological rules at acoustic model train time (see japanese phonological rule config, with the G2P model generating more "citation" forms. You can use the wiktionary entries for 箸 and 橋 to get the relevant pitch accent, and then you can apply the devoicing rules to the non-high pitch syllables? Though I don't think I've seen any literature that mentions that the high vowel devoicing is dependent on pitch accent, but it makes sense that if a syllable has high pitch, it wouldn't be devoiced. Additionally, you can see the calculated probability of application from the devoicing rules from the
If your end goal is to analyze differences in pronunciations between the different pitch accent patterns, it might be worth just generating a dedicated resource from wiktionary that maps words to their pattern, though you'd have do some modifications to a scraping script like wikipron, since the IPA transcription doesn't contain any pitch accent information. |
Thank you very much for the very detailed answer! It seems that, at the very least, I have been mixing the concepts of pitch accents and stressing. My end goal would be to get some different phoneme representations for these 2 cases from alignment results and from dictionaries, and evaluate alternatives if not possible. That's why I was thinking about IPA stress symbols. But if I understood you correctly, it seems that for Japanese this would rather be pitch accents, which have no IPA representation. Thanks for the WikiPron link, it surely sounds useful to gather pronunciation data from Wikitionary. Maybe I can combine this information with the IPA phonemes to get what I need, although I would need to figure out how to exactly map and combine these two pieces of information (e.g, ha̠ɕi and háꜜshì). Feel free to close this feature request if you don't have any further comments. |
By the way, are other languages in MFA that use stress, like English, currently representing this information in the IPA phonemes they use in dictionaries and returned alignments? (the PROject vs proJECT case you mentioned) |
A typical example where stress is important in Japanese is 橋 (bridge, read as はし, hashi) vs 箸 (chopsticks, read also as はし, hashi). The former (橋) has stress in the second syllable, where the latter (箸) has stress in the first syllable.
Currently, the Japanese MFA G2P model seems to produce "h a ɕ i" for both. The Japanese MFA v3.0.0 dictionary is similar, although it lists the last phoneme for 箸 as voiceless in its most likely pronunciation ("h a ɕ i̥"). The first syllable is the same though, which is the one being stressed.
Is there any way to support modelling these differences in stress in the generated IPA phonemes, or in some other way? I see there are primary and secondary stress symbols in IPA, but I don't know enough to judge if this would be a good approach or not.
The text was updated successfully, but these errors were encountered: