Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creates PREMIS CSV implementation scripts #205

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Conversation

kieranjol
Copy link
Owner

Not ready to merge, but sending a pull request for visibility. I added a lot of information to the readme and to the docstrings within the functions, so here's a copypaste of the docstring output as generate by pydoc

Help on module premisobjects:

NAME
    premisobjects

FILE
    ifigit/ifiscripts/premisobjects.py

DESCRIPTION
    Creates a somewhat PREMIS compliant CSV file describing objects in a package.
    A seperate script will need to be written in order to transform these
    CSV files into XML.
    As the flat CSV structure prevents maintaining some of the complex
    relationships between units, some semantic units have been merged, for example:
    relation_structural_includes is really a combination of the
    relationshipType and relationshipSubType units, which each have the values:
    Structural and Includes respectively.
    
    todo:
    Document identifier assignment for files and IE. Probably in events sheet?
    Allow for derivation to be entered
    Link with events sheet
    Link mediainfo xml in /metadata to the objectCharacteristicsExtension field.
    
    
    Assumptions for now: representation UUID already exists as part of the
    SIP/AIP folder structure. Find a way to supply this, probably via argparse.

FUNCTIONS
    file_description(source, manifest, representation_uuid)
        Generate PREMIS descriptions for items and write to CSV.
    
    find_representation_uuid(source)
        This extracts the representation UUID from a directory name.
        This should be moved to ififuncs as it can be used by other scripts.
    
    get_checksum(manifest, filename)
        Extracts checksum from manifest, rather than generating a fresh one.
    
    intellectual_entity_description()
        Generate PREMIS descriptions for Intellectual Entities and write to CSV.
    
    main()
        Launches all the other functions when run from the command line.
    
    make_skeleton_csv()
        Generates a CSV with PREMIS-esque headings. Currently it's just called
        'cle.csv' but it will probably be called:
        UUID_premisobjects.csv
        and sit in the metadata directory.
    
    representation_description(representation_uuid, item_ids)
        Generate PREMIS descriptions for a representation and write to CSV.


Help on module premiscsv:

NAME
    premiscsv

FILE
    ifigit/ifiscripts/premiscsv.py

DESCRIPTION
    Extracts preservation events from an IFI plain text log file and converts
    to a CSV using the PREMIS data dictionary

FUNCTIONS
    find_events(logfile)
        A very hacky attempt to extract the relevant preservation events from our
        log files.
    
    main()
        Launches all the other functions when run from the command line.
    
    make_events_csv()
        Generates a CSV with PREMIS-esque headings. Currently it's just called
        'bla.csv' but it will probably be called:
        UUID_premisevents.csv
        and sit in the metadata directory.


kieranjol added 30 commits July 30, 2017 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant