-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor auto-archiver to use a modular structure for feeders/extractors/enrichers etc. #185
Open
pjrobertson
wants to merge
57
commits into
main
Choose a base branch
from
load_modules
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,689
−1,646
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading configs now works
(two simple helper functions to convert between dot and dict notation)
# Conflicts: # src/auto_archiver/databases/__init__.py
# Conflicts: # src/auto_archiver/core/orchestrator.py
…ig (e.g. cli_feeder.urls Use 'do_not_store': True in the config settings to apply this. Also: fix up generic archiver dropins loading + local_storage defaults (same as what's in example orchestration)
… values, it also validates them
… isn't installed by default on most machines)
1. Allow loading modules from --module_paths=/extra/path/here 2. Improved unit tests for module loading 3. Further small tidy ups/clean ups
…ependencies' -> simpler/easier to remember
* Add implementation tests for orchestrator + logging tests * Standardise method/class vars for extractors to see if they are suitable * Fix bugs with removing default loguru logger (allows further customisation) * Fix bug loading required fields from file *
* Removes (partly) the ArchivingOrchestrator * Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com * More unit tests * Improved error handling
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something') The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Also:
auto-archiver --help
--extractor=new_temporary_extractor
-s
or--store
flag on the command line*You can now log directly to files using the
logging: file
option. Set the logging level as well usinglogging: level
module_paths=/my/own/modules/
to allow you to easily extend auto archiver with new modules. Simply create a new module, place it in that folder then pass the folder path on the command line/save it in your orchestration.yaml