Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include layer information to stanford.asdb AS categories #116

Closed
JustinLoye opened this issue Jan 26, 2024 · 3 comments · Fixed by #126
Closed

Include layer information to stanford.asdb AS categories #116

JustinLoye opened this issue Jan 26, 2024 · 3 comments · Fixed by #126
Assignees

Comments

@JustinLoye
Copy link
Contributor

Explain the dataset you want to add and how it would contribute to the Internet Yellow Pages.

stanford.asdb classifies ASN into categories (aka layer 1) and sub-categories (layer 2).
However, currently iyp does not store the layer information.
It could be nice to have it for several reasons:

  • Enabling querying of the desired layer
  • Fix ambiguities: currently, if an AS is tagged with label Other, it's unclear whether it refers to the layer 1 Other or the many layer 2 Other (e.g. no distinction between Other and Computer and Information Technology -> Other)

If possible describe how you would like to model the dataset in the Yellow Pages

  • From my understanding of iyp querying pattern I suggest adding a layer property to the links -[r:CATEGORIZED {reference_name:"stanford.asdb"}]-
  • To preserve the category hierarchy, add the possibility for tags to have a PART_OF link to other tags
  • There might be an edge case to cover: some ASes are, for example, simply registered as Computer and Information Technology (no layer 2 information) or Computer and Information Technology -> Other (layer 2 information but not really informative). Consider dropping the Other layer 2 categories?
@m-appel
Copy link
Member

m-appel commented Jan 26, 2024

We have two options:

  1. Keep the Other and model around it or
  2. Ignore all (layer 1 + 2) Other categories.

While I prefer option 2, we would ignore data from the original dataset, which we normally do not want, except if it's obviously broken.
For option 2 we would only need to not push the Other node and all relationships to it, so it would be easy.

For option 1, I would prefer to not introduce (:Tag)-[:PART_OF]->(:Tag) relationships, since this seems to overcomplicate the graph for a niche (and imho almost useless) edge case. Instead I would propose to concatenate the layer 1 name to the layer 2 name like you wrote and push the node as, e.g., (:Tag {label: 'Computer and Information Technology - Other'})

In any case it is a good idea to add a layer property to the CATEGORIZED relationship.

@romain-fontugne any opinions?

@romain-fontugne
Copy link
Member

I also prefer option 2, and yes, we should avoid cleaning imported dataset but in this case I don't really see the difference between level2 Other and not having that information. I think the level1 Other tag is still telling us that the AS is appearing in ASdb but the classification is not conclusive. So we may keep the level1 Other?

Also before adding the PART_OF relationship between tags we should double check that a level2 tag won't be part of two level1 tags. If this happens we should think of a smart way to handle that

@JustinLoye
Copy link
Contributor Author

I double checked if layer2 tags are part of several layer1 tags.
It is not the case EXCEPT for layer2 Metal, Glass, Wood, and Paper Manufacturing, that is part of both layer1 Other (for 6 ASes) and layer1 Manufacturing (for 666 ASes).

The 6 ASes with (Metal, Glass, Wood, and Paper Manufacturing) -[:PART_OF]-> (Other) are probably an error from stanford.asdb, so I suggest doing a manual correction?

Note that Metal, Glass, Wood, and Paper Manufacturing does not appear at all in the category list
https://asdb.stanford.edu/data/NAICSlite.csv
So we can only assume it's an error.

JustinLoye added a commit to JustinLoye/internet-yellow-pages that referenced this issue Feb 16, 2024
@m-appel m-appel linked a pull request Feb 16, 2024 that will close this issue
6 tasks
m-appel pushed a commit that referenced this issue Feb 16, 2024
…126)

* Issue #116 Include layer information to stanford.asdb AS categories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants