solved #83 #120

MAVRICK-1 · 2024-02-05T22:48:49Z

Description

#83 solved

Added new function to add relationship property

@m-appel can you review my PR :-)

…operty

MAVRICK-1 · 2024-02-06T12:36:40Z

@m-appel passed all checks can you review PR :-)

m-appel

Hi, have you tried to run this code? For me the add_relationship_properties function produces invalid Cypher queries, but you do not need this function anyways.

The basic outline of this crawler would be:

Fetch all data first and keep track of unique ASes and to which tag they should be connected
Fetch/create all AS nodes using batch_get_nodes_by_single_prop
Fetch/create the two tag nodes with get_node
Create links and push them with batch_add_links

Like I wrote in the issue, this crawler will be very similar to the bgpkit.pfx2asn crawler. So please look at that first, understand how it works and then try again :)

Thanks!

P.S.: I noticed that the RoVista API does not seem to return all results, I will follow up with the authors.

m-appel · 2024-02-07T05:06:45Z

iyp/crawlers/rov/rovista.py

+from iyp import BaseCrawler, RequestStatusError
+
+URL = 'https://api.rovista.netsecurelab.org/rovista/api/overview'
+ORG = 'ROV'


Let's call the organization RoVista (and rename the folder to rovista)

m-appel · 2024-02-07T05:09:27Z

iyp/crawlers/rov/rovista.py

+
+URL = 'https://api.rovista.netsecurelab.org/rovista/api/overview'
+ORG = 'ROV'
+NAME = 'rov.rovista'


and the script validating_rov, so NAME = 'rovista.validating_rov'

MAVRICK-1 · 2024-02-07T10:54:01Z

hi @m-appel , In the add_relationship_properties function ,I made a mistake in Cypher Query
Anways I have made some changes and used the existing function like you said , but I want some review before commiting .
I am sharing my code snippet could you review it once

URL = 'https://api.rovista.netsecurelab.org/rovista/api/overview'
ORG = 'RoVista'
NAME = 'rovista.validating_rov'


class Crawler(BaseCrawler):

    def run(self):
        """Get RoVista data from their API."""
        batch_size = 1000  # Adjust batch size as needed
        offset = 0
        entries = []
        asns = set()
        
        while True:
            # Make a request with the current offset
            response = requests.get(URL, params={'offset': offset, 'count': batch_size})
            if response.status_code != 200:
                raise RequestStatusError('Error while fetching RoVista data')
            
            data = response.json().get('data', [])
            for entry in data:
                asns.add(entry['asn'])
                if entry['ratio'] > 0.5:
                    entries.append({'asn':entry['asn'],'ratio':entry['ratio'],'label':'Validating RPKI ROV'})
                else:
                    entries.append({'asn':entry['asn'],'ratio':entry['ratio'],'label':'Not Validating RPKI ROV'})

                
            # Move to the next page
            offset += batch_size
            # Break the loop if there's no more data
            if len(data) < batch_size:
                break
        logging.info('Pushing nodes to neo4j...\n')
        # get ASNs and Tag IDs
        self.asn_id = self.iyp.batch_get_nodes_by_single_prop('AS', 'asn', asns)
        tag_id_not_vali= self.iyp.get_node('Tag','{label:"Not Validating RPKI ROV"}',create=False)
        tag_id_vali=self.iyp.get_node('Tag','{label:"Validating RPKI ROV"}',create=False)
        # Compute links
        links = []
        for entry in entries:
            asn_qid = self.asn_id[entry['asn']]
            if entry['ratio'] > 0.5:
                links.append({'src_id': asn_qid, 'dst_id':tag_id_vali , 'props': [self.reference, entry]})
            else :
                links.append({'src_id': asn_qid, 'dst_id':tag_id_not_vali , 'props': [self.reference, entry]})
                
        logging.info('Pushing links to neo4j...\n')
        # Push all links to IYP
        self.iyp.batch_add_links('CATEGORIZED', links)

Is it correct ?

m-appel · 2024-02-07T13:07:43Z

Yes this looks better, but you can just commit, then I can use the GitHub interface to give easier feedback. (I will also squash all commits of the PR into one, so no worries about polluting the tree or something)

The properties for get_node should be an actual dict, not a string representing a dict. And you should set create=True (or rather delete create=False since it is the default), because the Not Validating RPKI ROV Tag node does not exist.

I also understood now why the API is weird, the offset parameter is actually not an offset, but a page parameter, i.e., it should be incremented by 1 instead of batch_size.

Btw. for future pull requests, please provide a better name and description. I believe there is also a template displayed when you create a new PR; please do not just delete everything, there are some useful checks to be aware of like "How did you test your code" and "Did you update the documentation".

If you run and test your code, you can be more confident if it is correct or not :)
Feel free to ask questions when you are stuck, but please do not submit a PR with code that simply does not execute and ask if it is ok.

MAVRICK-1 · 2024-02-07T13:23:27Z

Thanx @m-appel for the guidance and Feedback . I m doing the changes as you request :-)

…operty DNS remodeling (InternetHealthReport#119) * update url2domain to url2hostname * remove iana root zone file and dns hierarchy from config file * Atlas measurement targets are now hostnames * update openintel crawlers to the new DNS model * umbrella now ranks a mix of DomainName and HostName nodes and should be run after openintel.umbrella1m * Add explanation for cloudflare DNS modeling * lower umbrella crawler in config file * update READMEs with the new DNS modeling * add (:Service {name:'DNS'}) node and link it to authoritative name servers * Nodes do not have reference properties * Normalize IPv6 addresses * Fix wrong crawler name * Typos and formatting * Remove infra_mx crawler since it does not do anything at the moment * Update Cisco Umbrella crawler - Batch create new nodes (happens more often than expected) - Add logging output - Do not use builtins as variable names * Remove redundant set and parameters * Remove Service node for now We could not decide on a name, so we will deal with this later. --------- Co-authored-by: Malte Tashiro <[email protected]> Add OpenINTEL DNS dependency crawler Integrate with existing files and remove some unnecessary stuff. Co-authored-by: Raffaele Sommese <[email protected]> precommit error rectified Update __init__.py

MAVRICK-1 · 2024-02-07T15:31:58Z

@m-appel Modified the code and checked it still I am getting this precommit error

Added new realtion property and added new function to add relation pr…

af99af7

…operty

m-appel self-requested a review February 7, 2024 05:03

m-appel requested changes Feb 7, 2024

View reviewed changes

MAVRICK-1 force-pushed the RovDetection#83 branch from fc2ece4 to b697da9 Compare February 7, 2024 15:20

precommit

313ba93

MAVRICK-1 force-pushed the RovDetection#83 branch from eebd677 to 313ba93 Compare February 7, 2024 15:40

pre-commit

90ff3e6

MAVRICK-1 closed this Feb 7, 2024

MAVRICK-1 deleted the RovDetection#83 branch February 7, 2024 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solved #83 #120

solved #83 #120

MAVRICK-1 commented Feb 5, 2024

MAVRICK-1 commented Feb 6, 2024

m-appel left a comment

m-appel Feb 7, 2024

m-appel Feb 7, 2024

MAVRICK-1 commented Feb 7, 2024

m-appel commented Feb 7, 2024

MAVRICK-1 commented Feb 7, 2024

MAVRICK-1 commented Feb 7, 2024

solved #83 #120

solved #83 #120

Conversation

MAVRICK-1 commented Feb 5, 2024

Description

MAVRICK-1 commented Feb 6, 2024

m-appel left a comment

Choose a reason for hiding this comment

m-appel Feb 7, 2024

Choose a reason for hiding this comment

m-appel Feb 7, 2024

Choose a reason for hiding this comment

MAVRICK-1 commented Feb 7, 2024

m-appel commented Feb 7, 2024

MAVRICK-1 commented Feb 7, 2024

MAVRICK-1 commented Feb 7, 2024