Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solved #83 #120

Closed
Closed

Conversation

MAVRICK-1
Copy link
Contributor

Description

#83 solved

Added new function to add relationship property

@m-appel can you review my PR :-)

@MAVRICK-1
Copy link
Contributor Author

@m-appel passed all checks can you review PR :-)

@m-appel m-appel self-requested a review February 7, 2024 05:03
Copy link
Member

@m-appel m-appel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, have you tried to run this code? For me the add_relationship_properties function produces invalid Cypher queries, but you do not need this function anyways.

The basic outline of this crawler would be:

  1. Fetch all data first and keep track of unique ASes and to which tag they should be connected
  2. Fetch/create all AS nodes using batch_get_nodes_by_single_prop
  3. Fetch/create the two tag nodes with get_node
  4. Create links and push them with batch_add_links

Like I wrote in the issue, this crawler will be very similar to the bgpkit.pfx2asn crawler. So please look at that first, understand how it works and then try again :)

Thanks!

P.S.: I noticed that the RoVista API does not seem to return all results, I will follow up with the authors.

from iyp import BaseCrawler, RequestStatusError

URL = 'https://api.rovista.netsecurelab.org/rovista/api/overview'
ORG = 'ROV'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call the organization RoVista (and rename the folder to rovista)


URL = 'https://api.rovista.netsecurelab.org/rovista/api/overview'
ORG = 'ROV'
NAME = 'rov.rovista'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the script validating_rov, so NAME = 'rovista.validating_rov'

@MAVRICK-1
Copy link
Contributor Author

hi @m-appel , In the add_relationship_properties function ,I made a mistake in Cypher Query
Anways I have made some changes and used the existing function like you said , but I want some review before commiting .
I am sharing my code snippet could you review it once

URL = 'https://api.rovista.netsecurelab.org/rovista/api/overview'
ORG = 'RoVista'
NAME = 'rovista.validating_rov'


class Crawler(BaseCrawler):

    def run(self):
        """Get RoVista data from their API."""
        batch_size = 1000  # Adjust batch size as needed
        offset = 0
        entries = []
        asns = set()
        
        while True:
            # Make a request with the current offset
            response = requests.get(URL, params={'offset': offset, 'count': batch_size})
            if response.status_code != 200:
                raise RequestStatusError('Error while fetching RoVista data')
            
            data = response.json().get('data', [])
            for entry in data:
                asns.add(entry['asn'])
                if entry['ratio'] > 0.5:
                    entries.append({'asn':entry['asn'],'ratio':entry['ratio'],'label':'Validating RPKI ROV'})
                else:
                    entries.append({'asn':entry['asn'],'ratio':entry['ratio'],'label':'Not Validating RPKI ROV'})

                
            # Move to the next page
            offset += batch_size
            # Break the loop if there's no more data
            if len(data) < batch_size:
                break
        logging.info('Pushing nodes to neo4j...\n')
        # get ASNs and Tag IDs
        self.asn_id = self.iyp.batch_get_nodes_by_single_prop('AS', 'asn', asns)
        tag_id_not_vali= self.iyp.get_node('Tag','{label:"Not Validating RPKI ROV"}',create=False)
        tag_id_vali=self.iyp.get_node('Tag','{label:"Validating RPKI ROV"}',create=False)
        # Compute links
        links = []
        for entry in entries:
            asn_qid = self.asn_id[entry['asn']]
            if entry['ratio'] > 0.5:
                links.append({'src_id': asn_qid, 'dst_id':tag_id_vali , 'props': [self.reference, entry]})
            else :
                links.append({'src_id': asn_qid, 'dst_id':tag_id_not_vali , 'props': [self.reference, entry]})
                
        logging.info('Pushing links to neo4j...\n')
        # Push all links to IYP
        self.iyp.batch_add_links('CATEGORIZED', links)

Is it correct ?

@m-appel
Copy link
Member

m-appel commented Feb 7, 2024

Yes this looks better, but you can just commit, then I can use the GitHub interface to give easier feedback. (I will also squash all commits of the PR into one, so no worries about polluting the tree or something)

The properties for get_node should be an actual dict, not a string representing a dict. And you should set create=True (or rather delete create=False since it is the default), because the Not Validating RPKI ROV Tag node does not exist.

I also understood now why the API is weird, the offset parameter is actually not an offset, but a page parameter, i.e., it should be incremented by 1 instead of batch_size.

Btw. for future pull requests, please provide a better name and description. I believe there is also a template displayed when you create a new PR; please do not just delete everything, there are some useful checks to be aware of like "How did you test your code" and "Did you update the documentation".

If you run and test your code, you can be more confident if it is correct or not :)
Feel free to ask questions when you are stuck, but please do not submit a PR with code that simply does not execute and ask if it is ok.

@MAVRICK-1
Copy link
Contributor Author

Thanx @m-appel for the guidance and Feedback . I m doing the changes as you request :-)

…operty

DNS remodeling (InternetHealthReport#119)

* update url2domain to url2hostname

* remove iana root zone file and dns hierarchy from config file

* Atlas measurement targets are now hostnames

* update openintel crawlers to the new DNS model

* umbrella now ranks a mix of DomainName and HostName nodes and should be run after openintel.umbrella1m

* Add explanation for cloudflare DNS modeling

* lower umbrella crawler in config file

* update READMEs with the new DNS modeling

* add (:Service {name:'DNS'}) node and link it to authoritative name servers

* Nodes do not have reference properties

* Normalize IPv6 addresses

* Fix wrong crawler name

* Typos and formatting

* Remove infra_mx crawler since it does not do anything at the moment

* Update Cisco Umbrella crawler

- Batch create new nodes (happens more often than expected)
- Add logging output
- Do not use builtins as variable names

* Remove redundant set and parameters

* Remove Service node for now

We could not decide on a name, so we will deal with this later.

---------

Co-authored-by: Malte Tashiro <[email protected]>

Add OpenINTEL DNS dependency crawler

Integrate with existing files and remove some unnecessary stuff.

Co-authored-by: Raffaele Sommese <[email protected]>

precommit error rectified

Update __init__.py
@MAVRICK-1
Copy link
Contributor Author

@m-appel Modified the code and checked it still I am getting this precommit error

@MAVRICK-1 MAVRICK-1 closed this Feb 7, 2024
@MAVRICK-1 MAVRICK-1 deleted the RovDetection#83 branch February 7, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants