-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inventory of corpus with pronunciation variants #792
Comments
Suggested UI below.. (if no variants, grey out 'phonetic' and 'total' tabs)
Internally... We already have inventory categorization functionality, so I think we can use the same on variant pronunciations and create 'phonetic' table?? (I'm being optimistic) As for the 'total' table, combine 'phonological' and 'phonetic' and remove duplicates? (And what is this? Seems like alternative inventories were tried in 2016?) CorpusTools/corpustools/corpus/classes/lexicon.py Lines 2759 to 2782 in f1ba665
|
Also see #560 especially, #560 (comment)
Seems like it is intentional to only contain default segments in the main inventory chart. Alternative inventory chart should not be editable but can be accessed in analysis functions? |
For now, this just adds all variant segments. Eventually, need to have different tables for phonetic, phonological and total inventory (but after the release).
Two txt files are added to Dropbox. Find them in Phonological_CorpusTools_Public/example_files/variants/variants in inventory
|
This lets the user know that phonological search does not work with segments that are only found in pronunciation variants.
Our current interim solution does show all symbols (phonetic or phonological) in a 'master' inventory table. Searches based on these symbols return 0, but there is a note to that effect in the search dialogue box. However, there are two further problems: (1) All analyses have the same issue as searches (e.g., if you try to calculate functional load based on minimal pairs for [t] / [ɾ] in the above corpora ('writing' vs. 'riding'), the result is 0. This is likely to be true of ALL analyses. (2) If you pull up the 'corpus summary' inventory and click on a symbol that happens to occur only in phonetic variants (e.g. [ɾ] or [kʰ] in the above corpora), PCT crashes outright with no error message (instead of giving either the actual type / token count or 0). Given these issues, I actually think we should 'roll back' the commit that added symbols that appear only in pronunciation variants to the total inventory, and simply clarify in the documentation that currently, only canonical pronunciations are used to populate the inventory and hence can be used in searches and analyses. @stannam maybe we could add instead a note on the 'corpus summary' dialogue box that says "Note that this inventory is based on only the symbols that occur in canonical pronunciations. PCT does not include symbols from pronunciation variants in the inventory, and such symbols cannot currently be directly searched for or used in analyses." Thanks, and sorry for the hassle! :( |
reverting because the changes raise problems. re: #792
Hmm. I see that the corpus summary window is updated with the suggested note. But, it looks like PCT is still pulling in the inventory from pronunciation variants, so it's not quite rolled back. E.g.: Load 'variant_inventory_ilg' corpus. If instead you go to Corpus > Phonological search, again, [ɾ] occurs in the inventory; searching for it returns a count of 0. (And basically the same thing happens in variant_inventory_csv, though of course there the [ɾ] is in the phonetic transcription column, not stored as a pronunciation variant. |
That is strange. On my end, the chart only contains canonical segments in the summary window and other places including analysis functions and Features > Manage inventory chart. Can you try to load the .txt files again? I used the two files in example_files/variants/variants in inventory. |
Ah! Yes...again, silly on my part. I was reloading the existing corpora instead of creating them from scratch. Looks good, thank you. |
The inventory of a corpus is always based on the canonical pronunciations, not the full set of sounds in ANY pronunciation variants. So e.g. if your English canonical pronunciations contain /t/ and /d/, but your variants contain [ɾ], you can't search for [ɾ] or include it in your analyses. We need to allow inventories to be built from:
The text was updated successfully, but these errors were encountered: