Misc project cleanups (#65)

Generated #63 and #64
Notgnoshi · Nov 30, 2024 · 7a838cf · 7a838cf
2 parents 2b12c43 + 96af822
commit 7a838cf
Show file tree

Hide file tree

Showing 18 changed files with 411 additions and 269 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
 /target/
 /data/
+/mutants*/
diff --git a/docs/design/data-model.md b/docs/design/data-model.md
@@ -1,13 +1,87 @@
 # Data model
-**Status:** In consideration
-
-**TODO:** What _is_ an achievement? It should probably hold:
-* achievement id
-* title
-* description
-* level (copper, silver, gold, platinum)
-* number of instances
-* user id (email? how to handle .mailmap?)
-* the generating commits (how to do this in a way that's linkable in the representation layer?)
-
-How to handle .mailmap files? Should that be done before or after achievement creation?
+
+# Status
+
+**PROPOSAL**
+
+# Scope
+
+This document answers the following questions
+* What data does an achievement contain or reference?
+* What inputs are required for a Rule engine?
+
+# Achievements
+
+## Achievement uniqueness
+
+1. Repeatable. E.g., swear in a commit message.
+2. Unique. E.g., longest/shortest commit message.
+
+## Achievement contents
+
+* Achievement ID
+
+  This can be used to look up the title, description, art, etc., or this data can be embedded in the
+  achievement.
+* User ID
+
+  This needs to be .mailmap aware, and may need to have committer/author distinction?
+* What repository the achievement is associated with
+* What commit(s) the achievement is associated with
+
+  There will always be a "primary" commit, but in the case of e.g. revert commits, there might be
+  additional "context" commits.
+* Achievement uniqueness
+
+  A consumer of the rule engine will consume this, and determine whether it needs to revoke the
+  achievement from another user to grant it to another one.
+
+# Rules
+
+## Mailmap
+
+Rule generation does need to be mailmap aware, because there might be some rules like "be most
+prolific contributor" that would change based on mailmaps.
+
+## Rule initialization
+
+Some rules might require (or be more efficient) if there's an initialization phase
+
+```rust
+fn init(&mut self, repository: &git2::Repository, config: &RulesConfig) {}
+```
+
+## Caching concerns
+
+Some rules might not work well if previous runs are cached. For example, stateful rules like
+"longest commit message" may either require rejecting the cache acceleration, or may require adding
+rule-specific data to the cache.
+
+## Rule variants
+
+1. Context-free. E.g., swear in a commit message
+    1. Commit message
+    2. Commit message + diff
+    3. Commit message + diff + submodule
+2. Contextual
+    1. User aware. E.g., be the most prolific contributor
+    2. Commit history aware. E.g., revert a previous commit within 30min, or revert the same commit
+       multiple times
+
+## Rule configuration
+
+There might be global configuration shared between rules like
+* Exclude commits from these users
+* Exclude commit messages matching these hashes or regexes
+
+But there will also be rule-specific configuration like
+* When calculating the shortest subject line, exclude any subject lines longer than 10 characters
+
+## Rule inputs
+
+* The repository
+    * Any submodules of the repository
+* The reference being processed
+* The `&Config`
+* The commit itself
+* User mailmap
diff --git a/docs/design/integrations.md b/docs/design/integrations.md
@@ -1,13 +1,19 @@
 # Integrations
-**Status:** In consideration
 
-## GitLab
+# Status
+
+**DRAFT**
+
+# GitLab
 
 GitLab provides an [Achievements API](https://docs.gitlab.com/ee/user/profile/achievements.html).
 It appears it internally manages a database of achievements, and provides them their own ID when
 they are created. So we'll have to maintain a mapping of our achievement IDs to the GitLab
 instance's IDs.
 
+This implies that there needs to be a stateful database that is GitLab (or in general) integration
+specific.
+
 It does not appear that you can link an achievement to the commit that generated it.
 
 It'd be sweet to run this through a pipeline, although that'd require finding some way to have

diff --git a/docs/design/parallelism.md b/docs/design/parallelism.md
@@ -1,7 +1,12 @@
 # Parallelism
-**Status:** In consideration
 
-One of my complains with <https://github.com/someteam/acha> is how slow and resource heavy it is.
+# Status
+
+**DRAFT**
+
+# Goal
+
+One of my complaints with <https://github.com/someteam/acha> is how slow and resource heavy it is.
 And also that it's unmaintained and difficult to get running.
 
 In _theory_ it shouldn't be expensive to process all of the commits on a given branch (typically the
@@ -12,6 +17,13 @@ parallelism should also be a good way to speed it up.
 
 However, there are several approaches to parallelism, and the right choice depends on ???
 
+# Constraints
+
+My expectation is that achievement processing is likely I/O constrained, and thus I'd want to spool
+up more tasks than cores.
+
+# Approaches
+
 ## Repository level parallelism
 Each repository is processed in serial, but multiple repositories can be processed at once.
 
@@ -69,7 +81,7 @@ flowchart TD
     r3 -.-> achievements
 ```
 
-## The approach to use
+# Proposal
 
 I think I'll start by doing things in serial (repository level parallelism) so see if processing the
 rules is too expensive (after caching).

diff --git a/docs/design/persistence.md b/docs/design/persistence.md
@@ -1,11 +1,12 @@
 # Data storage
-**Status:** In consideration
 
-## Use cases
+# Use cases
 
 There are four things that need to be stored
 
-### 1. Remember which repositories/branches to process
+## 1. Remember which repositories/branches to process
+
+**IMPLEMENTED**
 
 **Why?**
 * Enables running Herostratus as a scheduled job
@@ -23,7 +24,9 @@ Things that need to be stored:
 * Commit filtering
 * Mailmap settings
 
-### 2. Remember which commits/rules have been processed for each repository/branch
+## 2. Remember which commits/rules have been processed for each repository/branch
+
+**DRAFT**
 
 **Why?** Performance improvement. It can take quite long to process large-ish repositories like
 Linux and Git.
@@ -39,22 +42,31 @@ From an edge-case and "purity" perspective, option #3 is the worst. But from a s
 common-case perspective, it's the best (least data storage, simplest, easiest to understand, easiest
 to implement).
 
-### 3. Granted Achievements
+**NOTE:** Some rules (like "longest commit message") require either rejecting the cache, or caching
+rule-specific data.
+
+## 3. Granted Achievements
+
+**DRAFT**
 
 **Why?**
 * Avoid granting duplicates
 * Enable easier access to Herostratus data by the user (enable them to build whatever they want on
   top).
 * Enable easier integration implementations
 
-### 4. Mapping from each possible Herostratus achievement to their corresponding GitLab achievement IDs
+## 4. Mapping from each possible Herostratus achievement to their corresponding GitLab achievement IDs
+
+**DRAFT**
 
 **Why?** When you create a GitLab achievement, it returns an ID for each created achievement. So you
 need to store them, so that you can grant them to users. And Herostratus (at least the GitLab
 integration part) will need to store them, so that it can map between Herostratus achievements and
 GitLab achievements.
 
-## Design
+# Design
+
+**IMPLEMENTED**
 
 Use CLI subcommands to separate stateful from stateless operations in ways that's intuitive to the
 user.

diff --git a/docs/design/test-data.md b/docs/design/test-data.md
@@ -1,19 +1,31 @@
 # Test data
-**Status:** In consideration
+
+# Status
+
+**IMPLEMENTED**
+
+# Repository sources
 
 The primary CLI tool should be able to process repositories in the following forms:
 1. Existing on-disk worktrees
 2. Existing on-disk bare repositories
 3. HTTP, HTTPS, SSH remote clone URLs
     1. These should be cloned into something like `~/.cache/herostratus/git/`
 
-Or maybe, it _just_ consumes remotes, and you pass the on-disk work trees and bare repositories _as_
-local remotes that herostratus can fetch from? That might make for a more consistent, and easier to
-test application?
+It should not require the branch be checked out.
 
-Perhaps the primary CLI tool should _only_ look at on-disk repositories, and the cloning should be
-handled by a wrapper?
+# Test data
 
-It should not require the branch be checked out.
+There should be orphan branches containing test commits in this repository. These will be prefixed
+with `test/`. They can be made like
+
+```sh
+git checkout --orphan test/simple
+git rm -rf .
+for i in `seq 0 4`; do
+    git commit --allow-empty -m "test/simple: $i"
+done
+```
 
-There should be orphan branches containing test commits in this repository
+See existing test branches here:
+<https://github.com/Notgnoshi/herostratus/branches/all?query=test%2F>
diff --git a/docs/design/user-contributed-rules.md b/docs/design/user-contributed-rules.md
@@ -1,20 +1,44 @@
 # User contributed rules
-**Status:** In consideration
-
-The method for consuming user-provided rules depends on the language used in the primary
-implementation. I'm leaning towards either Rust or Python. Python would be far easier to implement
-user-contributed rules, but my language preference is Rust.
-
-* Plugins:
-    * If in Rust, this could be dylibs. This gets challenging because of the lack of a stable ABI.
-      There are many approaches to providing dylib plugins, which is a pretty interesting topic for
-      me personally, but would be quite a lot of work.
-    * If in Python, this could be similar to
-      <https://jorisroovers.com/gitlint/latest/rules/user_defined_rules/> where you import a
-      `herostratus.rules.AchievementRule` interface, and then implement it.
-* Consume executables that take the commits from `stdin`, and write the achievements (in JSON?) to
-  `stdout`
-    * Require the scripts consume a stream of commits? Or a single commit? Probably a single commit,
-      so that herostratus can provide them the contents of `git show`.
-    * There should be a way to tell herostratus that it shouldn't provide the full diff output to
-      the script
+
+# Status
+
+**PROPOSAL**
+
+# Goal
+
+Enable users to run their own rules
+
+# Approaches
+
+## Force users to fork the project
+
+Don't provide a mechanism for users to add their own rules. Force them to fork the project, and
+write their own rules.
+
+## Make it easy for users to contribute their own rules
+
+Make it easy enough to contribute new rules, that users feel they can do so. This may require
+maintaining a set of default and non-default rules. It may also require toning down the
+[contribution standards](../../CONTRIBUTING.md)
+
+## Wrap scripts
+
+Define a `stdin`/`stdout` JSON API, and let users write their own achievement generation tools.
+
+## Plugins
+
+### dylib
+
+Challenging because Rust doesn't provide a stable ABI, even between invocations of the same compiler
+version (due to type layout randomization).
+
+### WASM
+
+WASM seems like it's the plugin mechanism of choice in Rust land. I personally find it awkward,
+because it's offloading the stable ABI concerns from the language and OS to the
+users. But it seems easier than dylibs.
+
+# Proposal
+
+*If* I get around to implementing user-contrib rules before I burn out, WASM plugins seem like the
+way to go.
diff --git a/herostratus-tests/src/fixtures/repository.rs b/herostratus-tests/src/fixtures/repository.rs
@@ -55,7 +55,7 @@ pub fn with_empty_commits(messages: &[&str]) -> eyre::Result<TempRepository> {
 #[cfg(test)]
 mod tests {
     use git2::{Index, Odb, Repository};
-    use herostratus::git::{rev_parse, rev_walk};
+    use herostratus::git;
 
     use super::*;
 
@@ -79,8 +79,8 @@ mod tests {
     fn test_new_repository() {
         let temp_repo = simplest().unwrap();
 
-        let rev = rev_parse("HEAD", &temp_repo.repo).unwrap();
-        let commits: Vec<_> = rev_walk(rev, &temp_repo.repo)
+        let rev = git::rev::parse("HEAD", &temp_repo.repo).unwrap();
+        let commits: Vec<_> = git::rev::walk(rev, &temp_repo.repo)
             .unwrap()
             .map(|oid| temp_repo.repo.find_commit(oid.unwrap()).unwrap())
             .collect();

diff --git a/herostratus/src/achievement/mod.rs b/herostratus/src/achievement/mod.rs
@@ -2,8 +2,6 @@
 #[allow(clippy::module_inception)]
 mod achievement;
 mod process_rules;
-#[cfg(test)]
-mod test_process_rules;
 
 pub use achievement::{Achievement, LoggedRule, Rule, RuleFactory};
 pub use process_rules::{grant, grant_with_rules};