Project to rate quality of github repositories like we vote for cool posts on hacker news.
This Document is an ongoing work on what the project could be and how to get there. Feel free to fork and pull request.
We meet every Monday at 2pm ET over Skype. If you are interested in joining us, feel free to ask !
There are tons of open-source projects out there, especially in the JavaScript community, and it is difficult to choose between project that does the same thing.
Part of the difficulty of choosing is lack of actionable intelligence on key features on a project:
- Code Quality
- Community size
- Support
- Dependencies qualities
- Documentation
The way we do things is to google the kind of thing we want, check out reviews and comparisons written by peers, or look at the documentation to really assess it. Maybe we will even try several final projects to decide.
That's tedious
A solution would be to rank repositories according to some criteria that will reflect the quality of a repository. There are already metrics to get the popularity of a project (stars, commits, pull-requests, issues, downloads) some others to get some automatic code quality check (CodeClimate, Codacy, Coveralls ...).
The problem with those automatic tools is that the metric they produce does not guarantee any quality. You have to use a project, test it in the real world to get this kind of info. Plus code style is something opinionated.
Sometimes the best way to do things is to make them do by humans
A good illustration of that is Hacker News. The idea is that the community votes for the best news posted by peers. Peers that get their news upvoted have their karma improved. The top posts are a combination of up-votes and freshness. That way the Top news change and they represent a quality sample of what's being published out there. Without any complicated machine learning in the background
A solution can be to use this peer rating principle and apply it to the open source world. Having peers score repositories against some defined criteria and weight their vote with their Karma. A user's Karma gets only as good as a combination of the scores of the repositories he contributed to.
We can then add some other automatic metrics in the computing of a repository score once we have defined those relevant.
- Your Karma gets better when you contribute to a well rated project.*
- Your Karma gets lower when projects you contribute to are badly rated.*
- Looking back, people that initially rated a project accurately should be rewarded by Karma.
- (OPTIONAL) Karma could also be augmented by Stack-Overflow karma.
*: How much it influences your karma should be proportional to your overall contribution.
- A Karma change should not affect the scores that the user voted for.
- Once a user votes for a repository, the score of this repository is updated and the karma of all the contributors to this repository are udpated as well. There is no propagation from these contributors to the repositories they have voted for.
- Score is base on some determined criteria evaluated by other users.
- Score can also be influenced by some key automatic metrics.
- A user cannot vote for a repository he contributes to.
- Weight score with user usage of the repository. If we can detect that one user has used a project, then we should give more weight to its rating.
- We also should ask a user to review its rating 2 months or so after having rated it for the first time so that he can have a more insightful opinion on a project.
- The score of a repository should also be affected by the score in the system of its dependencies. A repository with low score dependencies should not be rated well.
- The score of a repo isn't changed if the Karma of one of the people who voted for it changes.
- For a repo, the distribution of scores for each criteria should be displayed.
- A repository score change affects all contributors immediately
Score criteria from 1 to 5. While scoring a user can justify for each criteria his rating in a comment.
- Documentation
- Design
- Maturity
- Support
The UI should also enable the owner of a repository to have details on why the score of his repository is what it is. He/She should also be alble to read the potential reviews people might have left when rating the repo.
Right now the technologies in the pipeline for the implementation are:
Plateform
- Scala
- Neo4j
- Play Framework for Scala
- Semantic-UI
- Silhouette for authentication with GitHub
Use of an actor model to propagate the score between users and repos
We should mention that for the moment we so not get any history on the user karma and the repository score. Would the need rise, we would add a Postgres database to handle time series.
Furthermore, contributions should be merged so that only the timestamp of the last contribution and the total amount of lines added and removed by the user on the project is kept.
Store score/karma history outside the user/repo node.
Using shield.io models to have svg badges.
- This one is the general repository project. It is where we discuss general matters of the project.
- Gitrank-web : implementation
- Docker-builds: Builds of the implementation (Deprecated)
- Notebooks: algorithm prototyping with ipython notebook
- Github-indexer: Akka module for indexing github information on Elasticsearch.