Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The disk is filling up during content harvesting. I think the file removal statements in this bit of code got dropped during the recent caching implementation:
rikolti/content_harvester/by_record.py
Lines 149 to 156 in 9f7be23
The code in this PR removes source files and derivative files from local disk once they're no longer needed. I tried to find a neater way to remove the files all at once, i.e. at the end of
harvest_record_content
, but it proved tricky.I also made a change to
derivatives.subprocess_exception_handler
so that it raises an error instead of returning None if there is an error. As was, the DAG was succeeding even when, for example, thepdf_to_thumb
subprocess was failing. It seemed like the return value ofNone
was not meant to be permanent, given the fact thatraise(e)
was commented out?