Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document metadata.json format #78

Open
pleonard212 opened this issue Jun 11, 2019 · 7 comments
Open

Document metadata.json format #78

pleonard212 opened this issue Jun 11, 2019 · 7 comments

Comments

@pleonard212
Copy link

Just a quick thought -- having a minimal example of metadata.json could help folks who aren't in a Philo4 environment load in their texts...

@clovis
Copy link
Contributor

clovis commented Jun 12, 2019

Hey Peter!

There should be no need to generate your own metadata.json file if you use the include XML parser since that'll generate the file for you. In fact, that is the default behavior (not assuming philo4). Generating your own metadata.json file is certainly doable, but I've never done it separately from the existing code base. Do you have a particular use case in mind?

Now, I certainly take your point about documentation... I was hoping to find time for this lately, but sadly this has not happened. I should be making a new development push in the fall, so I'll definitely clear this up then.

I'm going to leave this open since documentation reminders can't hurt... :-)

@jamesgawley
Copy link

jamesgawley commented Jul 27, 2021

Hi Clovis,

I think I'm having similar issues to what Peter was dealing with. Basically, textpair's xml parser doesn't recognize the metadata in my target XML files. It does recognize the metadata in my source xml files, however, and as far as I can tell the document structures are the same. Maybe it would help to know what fields textpair is looking for, when it searches for metadata?

I'll paste a precis of one of the problematic XML files at the bottom of this comment, in case the issue leaps out at you. Thanks for your help; I hope you don't mind my making this support request here. I figured that someone else struggling with the same problem might end up in the same place.

<?xml version='1.0' encoding='utf-8'?>
<TEI xml:lang="fr">
    <teiHeader>
            <fileDesc>
                <titleStmt>
                    <title>L'HIVER, COMÉDIE.</title>
                <author>SOULAS d'ALLAINVAL, Abbé Léonor Jean Christine</author>
                </titleStmt>
            </fileDesc>
    </teiHeader>
    <text>
        <front>
            <docTitle>
                <titlePart type="main">L'HIVER</titlePart>
                <titlePart type="sub">COMÉDIE en TROIS ACTES avec un PROLOGUE</titlePart>
            </docTitle> 
            <docDate value="1733">M. DCC. XXXIII.</docDate>
            <docAuthor id="Abbé d'ALLAINVAL" bio="allainval">par M. d'ALLAINVAL</docAuthor>
        </front>
        <body>
            <div1 type="acte" n="1"><head>ACTE I </head>
                <div2 type="scene" n="1"><head>SCÈNE PREMIÈRE.</head>
                <sp who="L-HIVER"><speaker>L'HIVER seul, en habit fourré site un manchon.</speaker>
                    <l id="1">Des vrais plaisirs, unique asile ;</l>
                    <l id="2">Paris, c'est l'Hiver que tu vois :</l>
                    <l id="3">Las de régner au Nord, il vient, heureuse Ville,</l>
                    <l id="4">Dans tes murs enchanteurs, se délasser trois mois.</l>
                    <l id="5">Ne tremble point à voir mes neiges et mes glaces,</l>
                    <l id="6">Au rôle de Vieillard le fort m'a condamné,</l>
                    <l id="7">Mais le Printemps, malgré sa jeunesse et ses grâces,</l>
                    <l id="8">N'en est pas moins mon frère aîné.</l>
                    <l id="9">Bacchus, les Ris, les Jeux, sont toujours sur mes traces,</l>
                <ref type="note" target="philo_note_1" n="1" />
                    <l id="10">Et sous cet attirail barbon,</l>
                <ref type="note" target="philo_note_2" n="2" />
                    <l id="11">J'ai le coeur vert-galant, enjoué, vif, aimable ;</l>
                    <l id="12">J'ai toujours bon vin, bonne table,</l>
                    <l id="13">Et je n'ai pas toujours les mains dans mon manchon.</l>
                </sp>
                </div2>
            </div1>
        </body>
    </text>
</TEI>

@clovis
Copy link
Contributor

clovis commented Jul 28, 2021

Hi James,

Hmm.. odd... the file you sent looks good to me... Which metadata were you hoping to get picked up by the parser?

@jamesgawley
Copy link

Basically this stuff, which I converted to JSON manually to generate a metadata.json file (which also didn't work):

{"1":      {"filename":"/data/Target_db copy/ALLAINVAL_HIVER.tei",
               "author":"SOULAS d'ALLAINVAL, Abbé Léonor Jean Christine",
               "title":"L'HIVER, COMÉDIE.","options":{"metadata_xpaths": []}
             }
}

Thanks for getting back to me so quickly, though.

@clovis
Copy link
Contributor

clovis commented Jul 28, 2021

OK... could you possibly email me one source and one target file so I can debug things on my end? Thanks!

@jamesgawley
Copy link

Sent! Thanks again.

@clovis
Copy link
Contributor

clovis commented Aug 4, 2021

Hi James,

Apologies for not getting back to you sooner! So the TEI parser built into TextPAIR can only extract document level metadata. And even that is somewhat restricted since only a couple of XPATHs are defined for each field (and even those are restricted).

Supporting lower-level metadata parsing (and metadata parsing configurability) would require require some non-trivial changes. The alternative (and reason why this has never happened) is to load the texts under PhiloLogic (which allows you to specify within some limits the metadata fields you want to store) and use the PhiloLogic index to retrieve the metadata you want.

So if you want ACT or SCENE metadata information for your alignments, you would want to go through PhiloLogic. Another upside of leveraging PhiloLogic is that you can link your alignments to PhiloLogic to get back to the context of the alignment. Hope that helps, and let me know if you need any pointers.

C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants