diff --git a/docs/design/caching.md b/docs/design/caching.md deleted file mode 100644 index be3547e..0000000 --- a/docs/design/caching.md +++ /dev/null @@ -1,49 +0,0 @@ -# Caching -**Status:** In consideration - -## Goals -* Enable tracking achievements over time, by accelerating re-processing the same branch at a later - date -* **Question:** Should the caching still provide acceleration when new achievements are added? - -## Approaches - -No matter what, the caching should be done per-remote. That is, there should not be a distinct cache -file for each remote. - -### Cache file -The cache should probably be a sqlite database (perhaps the same db that achievements are stored -in?). Or maybe it should be separate, for simplicity, and ease of blowing away the cache? - -You should be able to pass `--clear-cache`, `--cache `, `--no-cache` CLI options. - -The cache should be loaded into memory, so that processing a repository minimizes file I/O. - -Alternatively, we have a directory structure like -``` -aa/ - 00/ - aa001122334455667788.txt -``` -Where the text file contains the committer date, the rules that were processed on this commit, and -maybe the repository remote? - -### How to identify a repository? -* By remote URL? - * How to pick the right remote? - * How to handle repositories that might not have remotes? -* By filesystem path (if we're not pointed at a remote, but a work tree or bare repo instead)? - * Maybe the initial commit could be the ID? - * Disallow moving / renaming local checkouts? - -### Store each processed rule -Load the processed rule ID's into an ordered set. This should be done regardless of the commit -caching strategy. - -### Strategy 1: Cache the last commit processed - -Probably works best in a linear history. May result in unnecessary re-processing. Easiest and -simplest to implement. - -### Strategy 2: Store each processed commit -Load the processed commits in an ordered set. Order could be hash, or committer date. diff --git a/docs/design/data-storage.md b/docs/design/data-storage.md deleted file mode 100644 index ab78c2a..0000000 --- a/docs/design/data-storage.md +++ /dev/null @@ -1,6 +0,0 @@ -# Data storage -**Status:** In consideration - -A goal of the project is to keep track of achievements over time. So achievements should be stored -in a useful format for consumers to display. Or maybe the storage should be deferred to the consumer -entirely. diff --git a/docs/design/persistence.md b/docs/design/persistence.md new file mode 100644 index 0000000..634da56 --- /dev/null +++ b/docs/design/persistence.md @@ -0,0 +1,68 @@ +# Data storage +**Status:** In consideration + +## Use cases + +There are four things that need to be stored + +### 1. Remember which repositories/branches to process + +**Why?** +* Enables running Herostratus as a scheduled job +* Simplifies the CLI invocation(s) + +User preference would probably be a TOML config file rather than stuffing it in a database. + +Things that need to be stored: +* Path to checkout (either a bare repo that Herostratus cloned, or some other path) +* Reference to process +* Remote URL to fetch +* HTTPS / SSH authentication information +* User-contrib rules +* Rule filtering +* Commit filtering +* Mailmap settings + +### 2. Remember which commits/rules have been processed for each repository/branch + +**Why?** Performance improvement. It can take quite long to process large-ish repositories like +Linux and Git. + +Strategies: +1. For each commit, store which rules have been run on them +2. Maintain a mapping of `Set` -> `Set`, where after processing, the mapping + contains only one `Set` of every possible rule, and it maps to all processed commits. +3. Stamp the `HEAD` commit with a "checkpoint" that indicates which rules have been processed on all + commits reachable from `HEAD` + +From an edge-case and "purity" perspective, option #3 is the worst. But from a simplicity and +common-case perspective, it's the best (least data storage, simplest, easiest to understand, easiest +to implement). + +### 3. Granted Achievements + +**Why?** +* Avoid granting duplicates +* Enable easier access to Herostratus data by the user (enable them to build whatever they want on + top). +* Enable easier integration implementations + +### 4. Mapping from each possible Herostratus achievement to their corresponding GitLab achievement IDs + +**Why?** When you create a GitLab achievement, it returns an ID for each created achievement. So you +need to store them, so that you can grant them to users. And Herostratus (at least the GitLab +integration part) will need to store them, so that it can map between Herostratus achievements and +GitLab achievements. + +## Design + +Use CLI subcommands to separate stateful from stateless operations in ways that's intuitive to the +user. + +| Command | Stateful? | Notes | +|------------------------------------------|-----------|------------------------------------------------------------------------| +| `herostratus [check] [reference]` | stateless | For testing. Process the repository at the given path. | +| `herostratus add [branch]` | stateful | Add the given repository to the config so that it can be checked later | +| `herostratus check-all` | stateful | Fetch and check all configured repositories | +| `herostratus fetch-all` | stateful | Fetch without checking all configured repositories | +| `herostratus remove [reference]` | stateful | Remove the given repository / branch from the config |