Skip to content

Commit

Permalink
Misc project cleanups (#65)
Browse files Browse the repository at this point in the history
Generated #63 and #64
  • Loading branch information
Notgnoshi authored Nov 30, 2024
2 parents 2b12c43 + 96af822 commit 7a838cf
Show file tree
Hide file tree
Showing 18 changed files with 411 additions and 269 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
/target/
/data/
/mutants*/
98 changes: 86 additions & 12 deletions docs/design/data-model.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,87 @@
# Data model
**Status:** In consideration

**TODO:** What _is_ an achievement? It should probably hold:
* achievement id
* title
* description
* level (copper, silver, gold, platinum)
* number of instances
* user id (email? how to handle .mailmap?)
* the generating commits (how to do this in a way that's linkable in the representation layer?)

How to handle .mailmap files? Should that be done before or after achievement creation?

# Status

**PROPOSAL**

# Scope

This document answers the following questions
* What data does an achievement contain or reference?
* What inputs are required for a Rule engine?

# Achievements

## Achievement uniqueness

1. Repeatable. E.g., swear in a commit message.
2. Unique. E.g., longest/shortest commit message.

## Achievement contents

* Achievement ID

This can be used to look up the title, description, art, etc., or this data can be embedded in the
achievement.
* User ID

This needs to be .mailmap aware, and may need to have committer/author distinction?
* What repository the achievement is associated with
* What commit(s) the achievement is associated with

There will always be a "primary" commit, but in the case of e.g. revert commits, there might be
additional "context" commits.
* Achievement uniqueness

A consumer of the rule engine will consume this, and determine whether it needs to revoke the
achievement from another user to grant it to another one.

# Rules

## Mailmap

Rule generation does need to be mailmap aware, because there might be some rules like "be most
prolific contributor" that would change based on mailmaps.

## Rule initialization

Some rules might require (or be more efficient) if there's an initialization phase

```rust
fn init(&mut self, repository: &git2::Repository, config: &RulesConfig) {}
```

## Caching concerns

Some rules might not work well if previous runs are cached. For example, stateful rules like
"longest commit message" may either require rejecting the cache acceleration, or may require adding
rule-specific data to the cache.

## Rule variants

1. Context-free. E.g., swear in a commit message
1. Commit message
2. Commit message + diff
3. Commit message + diff + submodule
2. Contextual
1. User aware. E.g., be the most prolific contributor
2. Commit history aware. E.g., revert a previous commit within 30min, or revert the same commit
multiple times

## Rule configuration

There might be global configuration shared between rules like
* Exclude commits from these users
* Exclude commit messages matching these hashes or regexes

But there will also be rule-specific configuration like
* When calculating the shortest subject line, exclude any subject lines longer than 10 characters

## Rule inputs

* The repository
* Any submodules of the repository
* The reference being processed
* The `&Config`
* The commit itself
* User mailmap
10 changes: 8 additions & 2 deletions docs/design/integrations.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
# Integrations
**Status:** In consideration

## GitLab
# Status

**DRAFT**

# GitLab

GitLab provides an [Achievements API](https://docs.gitlab.com/ee/user/profile/achievements.html).
It appears it internally manages a database of achievements, and provides them their own ID when
they are created. So we'll have to maintain a mapping of our achievement IDs to the GitLab
instance's IDs.

This implies that there needs to be a stateful database that is GitLab (or in general) integration
specific.

It does not appear that you can link an achievement to the commit that generated it.

It'd be sweet to run this through a pipeline, although that'd require finding some way to have
Expand Down
18 changes: 15 additions & 3 deletions docs/design/parallelism.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
# Parallelism
**Status:** In consideration

One of my complains with <https://github.com/someteam/acha> is how slow and resource heavy it is.
# Status

**DRAFT**

# Goal

One of my complaints with <https://github.com/someteam/acha> is how slow and resource heavy it is.
And also that it's unmaintained and difficult to get running.

In _theory_ it shouldn't be expensive to process all of the commits on a given branch (typically the
Expand All @@ -12,6 +17,13 @@ parallelism should also be a good way to speed it up.

However, there are several approaches to parallelism, and the right choice depends on ???

# Constraints

My expectation is that achievement processing is likely I/O constrained, and thus I'd want to spool
up more tasks than cores.

# Approaches

## Repository level parallelism
Each repository is processed in serial, but multiple repositories can be processed at once.

Expand Down Expand Up @@ -69,7 +81,7 @@ flowchart TD
r3 -.-> achievements
```

## The approach to use
# Proposal

I think I'll start by doing things in serial (repository level parallelism) so see if processing the
rules is too expensive (after caching).
Expand Down
26 changes: 19 additions & 7 deletions docs/design/persistence.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Data storage
**Status:** In consideration

## Use cases
# Use cases

There are four things that need to be stored

### 1. Remember which repositories/branches to process
## 1. Remember which repositories/branches to process

**IMPLEMENTED**

**Why?**
* Enables running Herostratus as a scheduled job
Expand All @@ -23,7 +24,9 @@ Things that need to be stored:
* Commit filtering
* Mailmap settings

### 2. Remember which commits/rules have been processed for each repository/branch
## 2. Remember which commits/rules have been processed for each repository/branch

**DRAFT**

**Why?** Performance improvement. It can take quite long to process large-ish repositories like
Linux and Git.
Expand All @@ -39,22 +42,31 @@ From an edge-case and "purity" perspective, option #3 is the worst. But from a s
common-case perspective, it's the best (least data storage, simplest, easiest to understand, easiest
to implement).

### 3. Granted Achievements
**NOTE:** Some rules (like "longest commit message") require either rejecting the cache, or caching
rule-specific data.

## 3. Granted Achievements

**DRAFT**

**Why?**
* Avoid granting duplicates
* Enable easier access to Herostratus data by the user (enable them to build whatever they want on
top).
* Enable easier integration implementations

### 4. Mapping from each possible Herostratus achievement to their corresponding GitLab achievement IDs
## 4. Mapping from each possible Herostratus achievement to their corresponding GitLab achievement IDs

**DRAFT**

**Why?** When you create a GitLab achievement, it returns an ID for each created achievement. So you
need to store them, so that you can grant them to users. And Herostratus (at least the GitLab
integration part) will need to store them, so that it can map between Herostratus achievements and
GitLab achievements.

## Design
# Design

**IMPLEMENTED**

Use CLI subcommands to separate stateful from stateless operations in ways that's intuitive to the
user.
Expand Down
28 changes: 20 additions & 8 deletions docs/design/test-data.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,31 @@
# Test data
**Status:** In consideration

# Status

**IMPLEMENTED**

# Repository sources

The primary CLI tool should be able to process repositories in the following forms:
1. Existing on-disk worktrees
2. Existing on-disk bare repositories
3. HTTP, HTTPS, SSH remote clone URLs
1. These should be cloned into something like `~/.cache/herostratus/git/`

Or maybe, it _just_ consumes remotes, and you pass the on-disk work trees and bare repositories _as_
local remotes that herostratus can fetch from? That might make for a more consistent, and easier to
test application?
It should not require the branch be checked out.

Perhaps the primary CLI tool should _only_ look at on-disk repositories, and the cloning should be
handled by a wrapper?
# Test data

It should not require the branch be checked out.
There should be orphan branches containing test commits in this repository. These will be prefixed
with `test/`. They can be made like

```sh
git checkout --orphan test/simple
git rm -rf .
for i in `seq 0 4`; do
git commit --allow-empty -m "test/simple: $i"
done
```

There should be orphan branches containing test commits in this repository
See existing test branches here:
<https://github.com/Notgnoshi/herostratus/branches/all?query=test%2F>
62 changes: 43 additions & 19 deletions docs/design/user-contributed-rules.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,44 @@
# User contributed rules
**Status:** In consideration

The method for consuming user-provided rules depends on the language used in the primary
implementation. I'm leaning towards either Rust or Python. Python would be far easier to implement
user-contributed rules, but my language preference is Rust.

* Plugins:
* If in Rust, this could be dylibs. This gets challenging because of the lack of a stable ABI.
There are many approaches to providing dylib plugins, which is a pretty interesting topic for
me personally, but would be quite a lot of work.
* If in Python, this could be similar to
<https://jorisroovers.com/gitlint/latest/rules/user_defined_rules/> where you import a
`herostratus.rules.AchievementRule` interface, and then implement it.
* Consume executables that take the commits from `stdin`, and write the achievements (in JSON?) to
`stdout`
* Require the scripts consume a stream of commits? Or a single commit? Probably a single commit,
so that herostratus can provide them the contents of `git show`.
* There should be a way to tell herostratus that it shouldn't provide the full diff output to
the script

# Status

**PROPOSAL**

# Goal

Enable users to run their own rules

# Approaches

## Force users to fork the project

Don't provide a mechanism for users to add their own rules. Force them to fork the project, and
write their own rules.

## Make it easy for users to contribute their own rules

Make it easy enough to contribute new rules, that users feel they can do so. This may require
maintaining a set of default and non-default rules. It may also require toning down the
[contribution standards](../../CONTRIBUTING.md)

## Wrap scripts

Define a `stdin`/`stdout` JSON API, and let users write their own achievement generation tools.

## Plugins

### dylib

Challenging because Rust doesn't provide a stable ABI, even between invocations of the same compiler
version (due to type layout randomization).

### WASM

WASM seems like it's the plugin mechanism of choice in Rust land. I personally find it awkward,
because it's offloading the stable ABI concerns from the language and OS to the
users. But it seems easier than dylibs.

# Proposal

*If* I get around to implementing user-contrib rules before I burn out, WASM plugins seem like the
way to go.
6 changes: 3 additions & 3 deletions herostratus-tests/src/fixtures/repository.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ pub fn with_empty_commits(messages: &[&str]) -> eyre::Result<TempRepository> {
#[cfg(test)]
mod tests {
use git2::{Index, Odb, Repository};
use herostratus::git::{rev_parse, rev_walk};
use herostratus::git;

use super::*;

Expand All @@ -79,8 +79,8 @@ mod tests {
fn test_new_repository() {
let temp_repo = simplest().unwrap();

let rev = rev_parse("HEAD", &temp_repo.repo).unwrap();
let commits: Vec<_> = rev_walk(rev, &temp_repo.repo)
let rev = git::rev::parse("HEAD", &temp_repo.repo).unwrap();
let commits: Vec<_> = git::rev::walk(rev, &temp_repo.repo)
.unwrap()
.map(|oid| temp_repo.repo.find_commit(oid.unwrap()).unwrap())
.collect();
Expand Down
2 changes: 0 additions & 2 deletions herostratus/src/achievement/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
#[allow(clippy::module_inception)]
mod achievement;
mod process_rules;
#[cfg(test)]
mod test_process_rules;

pub use achievement::{Achievement, LoggedRule, Rule, RuleFactory};
pub use process_rules::{grant, grant_with_rules};
Loading

0 comments on commit 7a838cf

Please sign in to comment.