Skip to content

Commit

Permalink
Update design doc for persistence
Browse files Browse the repository at this point in the history
Closes #41.
  • Loading branch information
Notgnoshi committed May 9, 2024
1 parent 937ad38 commit 4c14315
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 55 deletions.
49 changes: 0 additions & 49 deletions docs/design/caching.md

This file was deleted.

6 changes: 0 additions & 6 deletions docs/design/data-storage.md

This file was deleted.

68 changes: 68 additions & 0 deletions docs/design/persistence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Data storage
**Status:** In consideration

## Use cases

There are four things that need to be stored

### 1. Remember which repositories/branches to process

**Why?**
* Enables running Herostratus as a scheduled job
* Simplifies the CLI invocation(s)

User preference would probably be a TOML config file rather than stuffing it in a database.

Things that need to be stored:
* Path to checkout (either a bare repo that Herostratus cloned, or some other path)
* Reference to process
* Remote URL to fetch
* HTTPS / SSH authentication information
* User-contrib rules
* Rule filtering
* Commit filtering
* Mailmap settings

### 2. Remember which commits/rules have been processed for each repository/branch

**Why?** Performance improvement. It can take quite long to process large-ish repositories like
Linux and Git.

Strategies:
1. For each commit, store which rules have been run on them
2. Maintain a mapping of `Set<RuleId>` -> `Set<CommitHash>`, where after processing, the mapping
contains only one `Set<RuleId>` of every possible rule, and it maps to all processed commits.
3. Stamp the `HEAD` commit with a "checkpoint" that indicates which rules have been processed on all
commits reachable from `HEAD`

From an edge-case and "purity" perspective, option #3 is the worst. But from a simplicity and
common-case perspective, it's the best (least data storage, simplest, easiest to understand, easiest
to implement).

### 3. Granted Achievements

**Why?**
* Avoid granting duplicates
* Enable easier access to Herostratus data by the user (enable them to build whatever they want on
top).
* Enable easier integration implementations

### 4. Mapping from each possible Herostratus achievement to their corresponding GitLab achievement IDs

**Why?** When you create a GitLab achievement, it returns an ID for each created achievement. So you
need to store them, so that you can grant them to users. And Herostratus (at least the GitLab
integration part) will need to store them, so that it can map between Herostratus achievements and
GitLab achievements.

## Design

Use CLI subcommands to separate stateful from stateless operations in ways that's intuitive to the
user.

| Command | Stateful? | Notes |
|------------------------------------------|-----------|------------------------------------------------------------------------|
| `herostratus [check] <path> [reference]` | stateless | For testing. Process the repository at the given path. |
| `herostratus add <URL/PATH> [branch]` | stateful | Add the given repository to the config so that it can be checked later |
| `herostratus check-all` | stateful | Fetch and check all configured repositories |
| `herostratus fetch-all` | stateful | Fetch without checking all configured repositories |
| `herostratus remove <path> [reference]` | stateful | Remove the given repository / branch from the config |

0 comments on commit 4c14315

Please sign in to comment.