- Logging in
- Start an ingest
- Archivematica server monitoring
- Fixity checking and repair
- Dropping MySQL and ES data in pipelines
- Reindexing AIPs in Archival Storage indexes
- Deleting AIPs
- Cleaning up after successful Automation Tools ingests
- Responding to failed Automation Tools ingests
- Checking Automation Tools logs
- Adding and switching AIP Store locations
- Updating Automation Tools processing configuration
- Clearing space when local disk is nearly full
- Restarting services
- Log of changes to default Archivematica FPR
- Archivematica custom settings
- Archivematica updates and dependencies
For administrators to log into the Archivematica servers, they will need to ask IT to make them an account. The command to access the server is ssh [email protected]
where user
is your username and the string of numbers is the URL for the pipeline or Storage Service. Enter your password when prompted.
To log out, use Ctrl+D
.
Confirm that the digital processing archivists have uploaded their SIPs to the appropriate pipeline. Once you've logged into the server for that pipeline, do:
cd /mnt/incoming/transfers/UPLOADED_SIP_FOLDER
sudo mv * /mnt/incoming/auto-transfers/
This moves the SIPs into a watched folder. Archivematica will start ingesting these SIPs at the next five minute interval. Next confirm the folder is empty using the ls
command. Then delete the empty folder:
cd ..
rm -rf /UPLOADED_SIP_FOLDER/
Fixity checks of all AIPs are conducted on a quarterly basis using the Fixity application. Fixity's scanall
function is run via the cca-fixity scripts installed at /var/archivematica/cca-fixity
on the Storage Service VM and called automatically on the second Friday of each January, April, July, and October via the crontab. Logs are saved on the Archivematica Storage Service server at /var/log/cca-fixity
and results are emailed to the relevant administrators.
To manually conduct a fixity check of a single AIP:
- Connect to VSP-AMSS-01
- Load environmental variables:
source /etc/profile.d/fixity.sh
- Run fixity
scan
function:fixity scan <AIP UUID>
If you must run a scanall
fixity check manually, use nohup to ensure that the process runs to completion in the background, even if your terminal session ends
- Connect to VSP-AMSS-01
- Load environmental variables:
source /etc/profile.d/fixity.sh
- Run fixity
scanall
function:nohup fixity scanall &
When AIP corruption is detected, notify IT and restore the AIP from backups according to the Storage Service's Recovery procedures.
We will periodically drop the Elasticsearch indexes and rows from the MySQL databases on each of the processing pipelines to minimize resource drain and keep performance quick. This should be done every few months and only after all ingests have been successfully QAed.
Artefactual Support has been asked to create an estimate for a script to do this cleanup for pipeline MySQL databases.
To drop the ES index on a pipeline:
curl -XDELETE http://<pipeline.ip.address>:9200/aips
When the ES index is dropped, backups and logs from /srv/am-est-backups
and /var/log/elasticsearch
may also be deleted. This will help to save space on the local disk.
To re-index an AIP in an Archival Storage index on a given pipeline, you can use the rebuild-aip-index.py
and reindex-aip.sh
scripts from the Archivematica folder of the CCA scripts repo on the pipeline where you would like the AIPs to be re-indexed.
The bash script reindex-aip.sh
reindexes an AIP using the Archivematica devtools.
The Python script rebuild-aip-index.py
allows you to specify the UUIDs of the AIPs you would like to index by adding them as strings to the list "aip_list", then calls reindex-aip.sh
for each of the AIPs.
Before using the scripts:
- Clone the
archivematica-devtools
repo to your home folder on the pipeline server - Change paths in the scripts to point to correct locations
To send a deletion request for an AIP that is indexed on an Archivematica pipeline, follow these instructions.
To send a deletion request for an AIP that is not currently indexed on an Archivematica pipeline, ssh into the Storage Service VM and run the following curl command, replacing all values in < > with the appropriate information:
curl -X POST -H "Content-Type: application/json" -H "Authorization: ApiKey <SS user>:<SS user API key>" -d '{"event_reason":"<reason for deletion>","pipeline":"<UUID to pipeline to send deletion from>","user_id":<integer pipeline user primary key>,"user_email":"<email to send deletion notice to>"}' <SS URL>/api/v2/file/<AIP UUID>/delete_aip/
After ingests started via Automation Tools finish successfully, they must be deleted from the Transfer Source (in this case, /mnt/incoming/auto-transfers/
) manually. This must be done by a user with sudo
permissions, e.g. sudo rm -rf /path/to/transfer/
. Be careful when deleting files with sudo, as it is very easy to accidentally delete files you did not intend to delete.
Tessa wrote some code for the Automation Tools repository that will automatically delete transfer source files after a successful ingest if the transfer-script.sh
script is run with a --delete
flag. This code is currently being reviewed - if the pull request is accepted and merged, it would remove the need for manually cleaning up after successful Automation Tools ingests.
When an ingest started via Automation Tools fails, the first order of business is to determine why the ingest failed and either fix the transfer or submit a support ticket with Artefactual.
When you are ready to re-start the transfer/ingest after correcting the problem, you will notice that simply putting the transfer in the watched folder will not work. This is because the transfers.db
SQLite database that the Automation Tools use to keep track of which transfers have already been started will already include the transfer (determined based on the folder name), preventing it from being started.
The solution is to delete the transfers.db
database and to manually re-start the transfer-script.sh
with sudo
permissions and as user archivematica
:
sudo rm -f /var/archivematica/automation-tools/transfers.db
sudo -u archivematica /etc/archivematica/automation-tools/transfer-script.sh
You only need to manually call transfer-script.sh
once - after that, the crontab will call the script every 5 minutes, per the recommended installation instructions.
Note that deleting transfers.db
means that the Automation Tools will no longer have a record of which transfers in the /mnt/incoming/auto-transfers
directory have already been started, so you will want to make sure only transfers you intend to start are present in that directory before deleting the database.
If you run into unexpected behavior from Automation Tools (e.g. transfers not starting or not approving, DIPs not being created for new AIPs), the first place to check is the logs.
Relevant logs:
- Automation Tools transfers:
/var/log/archivematica/automation-tools/transfers.log
- Create DIP Job:
/var/log/archivematica/automation-tools/create-dip-jobs.log
Each storage location configured in the Storage Service is 5 TB in size (this was done in order to make backups manageable for IT). As storage locations fill, new locations will need to be added to the Storage Service and assigned as the default values for pipelines. Locations should contain only AIPs for which DIPs will be produced (e.g. processed digital archives) or AIPs for which DIPs will not be produced (e.g. digitization masters), not mixed.
Steps to start using a new AIP Store location:
- Ensure directory to add is mounted on Storage Service VM (read/write) and pipeline VMs (read-only). Ask CCA sysadmin to configure this if not true.
- Configure as a Space and AIP storage Location in the Storage Service GUI (or ask Artefactual support to do this for you).
- Set new AIP storage location as the default value in appropriate pipelines.
- Replace the defaultProcessingMCP.xml file used with Automation Tools transfers at
/opt/archivematica/automation-tools/transfers/pre-transfer/defaultProcessingMCP.xml
with an updated configuration. - (If new storage location will contain AIPs for which we want to generate DIPs) Modify the
--location-uuid
value in/etc/archivematica/automation-tools/create_dips_job_script.sh
to the value for the new AIP Store location. If it is necessary to monitor and create DIPs for more than one AIP Store location, create a second script (e.g.create_dips_job_script_1.sh
) and add this additional script to the crontab under userarchivematica
(to edit the crontab, usesudo crontab -u archivematica -e
).
These AIP Store locations are already configured in the Archivematica Storage Service:
AIP Store | Currently in Use? | Configured with Automation Tools DIP creation script? | Notes |
---|---|---|---|
DARK_ARCHIVE_001 | Yes | Yes | Current default AIP Store location for Pipeline 2 (processed digital archives) |
DARK_ARCHIVE_002 | Yes | No | Current default AIP Store location for Pipeline 1 (accessions and A/V) |
DARK_ARCHIVE_003 | No | n/a | n/a |
DARK_ARCHIVE_004 | No | n/a | n/a |
DARK_ARCHIVE_005 | No | n/a | n/a |
When updating the processing configuration, go to the "Administration" tab of the appropriate pipeline dashboard, select the "Default" configuration, and adjust settings as needed. Then click "Save" and download the updated congfiguration file from the following page.
This must then be put into Automation Tools in Archivematica. Send the configuration file to the appropriate pipeline using the "send_to_archivematica.py" script on the BitCurator machines. Then move the file from /mnt/incoming/transfer to Automation Tools.
When local disk space on one of the Archivematica pipelines is almost full, IT will send an alert. To clear space, you may delete some logs as well as older MySQL and Elasticsearch backups, namely:
/srv/am-db-backups
: Keep latest backup; all others can be deleted/srv/am-es-backups
: Keep latest backup (check file size to ensure it's a real backup); all others can be deleted/var/log/elasticsearch
: Delete any logs more than a month old
To stop services:
sudo stop archivematica-mcp-server
sudo stop archivematica-mcp-client
sudo stop archivematica-dashboard
sudo service nginx stop
sudo /etc/init.d/gearman-job-server stop
sudo service mysql stop
sudo /etc/init.d/elasticsearch stop
To start services:
sudo /etc/init.d/elasticsearch start
sudo service mysql start
sudo /etc/init.d/gearman-job-server start
sudo start archivematica-dashboard
sudo service nginx start
sudo start archivematica-mcp-server
sudo start archivematica-mcp-client
To restart only the Archivematica server/client/dashboard:
sudo bash -c 'for i in dashboard mcp-client mcp-server; do service archivematica-$i restart; done'
- FITS command replaced with bash script to prevent non-XML outputs from raising error (fixed in AM 1.7)
- FITS set as a characterization tool for all file formats -- note that this is crucial, as it means there is an indexed last modified date for all files ingested. See METSFlask documentation for details on how to ensure FITS characterization for all files in AM 1.6
- Extraction rules for RAW, ISO, AFF, and E01 disk images are disabled so that disk images can be stored alongside files (disk image extraction occurs in BitCurator prior to Archivematica ingest in our local workflows)
- Ghostscript PDF->PDF/A command replaced with a bash script that prevents errors when output filepaths are longer than 255 characters
- Normalization rules for Adobe Flash, Macromedia Flash, and Generic SWF are disabled (by default, Archivematica attempts to normalize these formats to ffv1/Matroska video, which failed nearly 100% of the time)
- Libreoffice and commands for transcoding to docx, odt, xlsx, ods, and pptx added
- Elasticsearch request_timeout value changed from 10 to 60 seconds
- Storage Service will be set to use all available cores for bagit-python fixity checking (default is 1; should be user-configurable in SS/0.11)
Artefactual also documented more atypical aspects of the CCA's Archivematica set-up:
- The automation tools are installed in /usr/lib/archivematica/automation-tools/
- The script located at /etc/archivematica/automation-tools/transfer-script.sh runs each 5 minutes as system user archivematica, and uses the Archivematica user autobot.
- The configuration file at /etc/archivematica/automation-tools/transfers.confdefines logfile and database are at /var/log/archivematica/automation-tools/ , and the lock file at /var/archivematica/automation-tools/transfers-pid.lck. This file might need to be removed in the automation-tools stop working.
- Automated ingests use folder /mnt/incoming/auto-transfers as transfer source, and the AIPs are stored in DARK_ARCHIVE_002 aipstore for VSP-AMPL1, and DARK_ARCHIVE_001 for VSP-AMPL2
Version updates for Archivematica should be done by Artefactual, and are covered by our service contract with them. However, because Archivematica has a number of dependencies, the Digital Archivist should assess the capacity of both local IT and Artefactual support prior to upgrading.
Dependencies which may affect the success of upgrades include:
- Automation tools on the BitCurator machines: Any change that results in a change to the Archivematica Pipeline server security keys will result in the create_dip.sh script failing. The script needs to be updated to reflect the new key.
- TMS API: The TMS API was affected by the upgrade to the 2019 version of Archivematica. More details forthcoming.
- SCOPE