Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure of rules in this repo #3

Open
hazanasec opened this issue Jul 31, 2020 · 4 comments
Open

Structure of rules in this repo #3

hazanasec opened this issue Jul 31, 2020 · 4 comments

Comments

@hazanasec
Copy link
Contributor

Need to agree on a structure of the rules i.e. /v6/java/6.2.5.yaml or /java/v6/6.2.5.yaml or /v6/6.2.5/java/rule.yaml

I think it makes sense to maybe go with language first? What will people look for when they first see the repo?

@clintgibler clintgibler changed the title Structure of rules Structure of rules in this repo Aug 1, 2020
@clintgibler
Copy link
Collaborator

This is a great question. My thoughts:

Top Folder Name

Language makes sense to be top level directory.

Why? Usually when you're running Semgrep on a repo you probably care more about running the rules for the languages contained in that repo, vs. "let me run all output encoding checks on this repo that's only Python."

Mid Folder Name

I'd prefer something like:

java/
  * v5_validation_sanitization_encoding/
  * v6_cryptography/
python/
  * v5_validation_sanitization_encoding/
  * v6_cryptography/

Personally I kind of like having the second level folder name include some context about what it is, so when you've navigating the tree structure it's more explicit than just "v6".

The user may not known what ASVS section corresponds to what off the top of their head, and having to reference another resource while they're navigating this repo or click into every directory seems unnecessarily high friction.

Rule Name

So language/vN_description/???.yml.

But what about after that? I'm not sure if we can just do 6.2.5.yaml, because it's likely that there will be some (many?) cases where there will need to be many rules, even with one language, to capture the same control.

For example, capturing how one can disable output encoding in Flask, Django, Jinja, etc. (all Python). Also, there may be various third-party libraries that provide similar functionality (e.g. crypto), and we want to flag the same issue in all of them (e.g. MD5).

Of course, we could try to put a series of patterns for the same control into the same rule (e.g. finding MD5 in every popular crypto library + stdlib), but that may get complicated, have what the rule does be less intuitive, as the check is less self contained (this rule finds exactly just this issue).

Maybe something like:

  • language/vN_description/6.2.5_short_description.yml or
  • language/vN_description/6.2.5/short_description_{a|b}.yml?

These are just some initial thoughts, happy to discuss 👍

Open Questions

One potential future issue I can foresee is that if we name every rule based on which ASVS control it corresponds to, when there's a new version of ASVS, we may have to rename things a bunch of places. For example, a chapter gets added, existing chapters get re-ordered, controls within a chapter get added/renamed/re-ordered, etc.

Not sure how to handle this in a way that makes future us minimally sad :P

@clintgibler
Copy link
Collaborator

Another potential structure is something like:

python/
  flask/
    v5_validation_sanitization_encoding/
    v6_cryptography/
  django/
    v5_validation_sanitization_encoding/
    v6_cryptography/

java/
  spring/
    v5_validation_sanitization_encoding/
    v6_cryptography/

What's nice about this is that as a consumer, it's structured very much by what you care about: I have these languages and these web app / frontend frameworks, so I'm going to use the rules in those folders. Potentially makes rule pack creation/maintenance easier.

The downside is that we'd be repeating the ASVS sections a bunch of times (for every language/framework combination, where the control is relevant) in the repo.

But that doesn't make our lives too much harder, and might make it easier for contributors to know where to put the rules they're writing.

@hazanasec
Copy link
Contributor Author

Looks good, I agree the structure in the second comment looks very nice for the consumer, and totally agree with the /short_description_{a|b}.yml

I don't think there's too much we could do at the moment to future proof it at the moment unfortunately!

Perhaps we need to add this structure and the message structure into a readme?

@danielcuthbert
Copy link

We will try and make sure that the numbering doesn't change drastically between 4.x and 5.x, but /short_description_{a|b}.yml works pretty well I feel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants