-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
371 additions
and
186 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,152 +1,111 @@ | ||
# Internet Yellow Pages | ||
|
||
The Internet Yellow Pages (IYP) is a knowledge database that gathers information about Internet resources (for example ASNs, IP prefixes, and domain names). | ||
The Internet Yellow Pages (IYP) is a knowledge database that gathers information about | ||
Internet resources (for example ASNs, IP prefixes, and domain names). | ||
|
||
## Public IYP prototype | ||
|
||
Visit http://iyp.iijlab.net to try our online prototype. No password is required, just click the 'connect' button to get started. Don't know how to use IYP ? You'll find a guide after clicking the 'connect' button, see also examples [here](https://github.com/InternetHealthReport/internet-yellow-pages/blob/main/documentation/gallery.md). | ||
Visit <https://iyp.iijlab.net> to try our online prototype. You will find instructions | ||
on how to connect to the prototype and some example queries there. For even more | ||
examples, check out the [IYP | ||
gallery](documentation/gallery.md). | ||
|
||
## Deploying a local IYP instance | ||
## Deploy a local IYP instance | ||
|
||
We describe the basic process of deploying a local IYP instance below. For more advanced | ||
commands see the [database documentation](documentation/database-management.md). | ||
|
||
### Prerequisites | ||
|
||
- [Curl](https://curl.se/download.html) | ||
- [Docker](https://www.docker.com/) | ||
- [Docker Compose](https://docs.docker.com/compose/install/) | ||
- about 30GB of free disk space | ||
- about 50GB of free disk space | ||
|
||
### Downloading the Database dump | ||
### Download the database dump | ||
|
||
#### Explore and Download Dumps | ||
|
||
Visit the database dumps repository at: | ||
``` | ||
https://ihr-archive.iijlab.net/ihr/iyp/ | ||
``` | ||
|
||
#### Specific Dump Format | ||
Visit the [database dump repository](https://ihr-archive.iijlab.net/ihr/iyp/). | ||
|
||
Dumps are organized by year, month, and day in this format: | ||
``` | ||
|
||
```text | ||
https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump | ||
``` | ||
|
||
Replace `YYYY`, `MM`, and `DD` in the URL with the desired date to access a specific database dump. | ||
|
||
#### Download Instructions | ||
|
||
1. **Create a Directory:** | ||
Replace `YYYY`, `MM`, and `DD` in the URL with the desired date to access a specific | ||
database dump. | ||
|
||
Execute the following command to create a `dumps` directory in your current working directory: | ||
``` | ||
mkdir dumps | ||
``` | ||
The dump file needs to be called `neo4j.dump` and needs to be put in a folder called | ||
`dumps` (`dumps/neo4j.dump`). | ||
To create the folder and download a dump with `curl`: | ||
|
||
2. **Download the Database Dump:** | ||
|
||
Use `curl` to download the database dump and save it in the `dumps/neo4j.dump` path: | ||
``` | ||
curl https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump -o dumps/neo4j.dump | ||
``` | ||
```bash | ||
mkdir dumps | ||
curl https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump -o dumps/neo4j.dump | ||
``` | ||
|
||
Remember to replace `YYYY`, `MM`, and `DD` in the download command with the specific date you require. | ||
### Set up IYP | ||
|
||
### Setting up IYP | ||
To uncompress the dump and start the database run the following command: | ||
|
||
```bash | ||
mkdir -p data | ||
UID="$(id -u)" GID="$(id -g)" docker compose --profile local up | ||
``` | ||
docker compose --profile local up | ||
``` | ||
This creates a `data` directory containing the database. | ||
This initial setup needs be done only once. | ||
It won't work if this directory already contains a database. | ||
|
||
Afterwards, you can simply [start/stop](#startstop-iyp) IYP to use it. | ||
To update the database with a new dump see [Updating an existing database](#updating-an-existing-database). | ||
This creates a `data` directory containing the database, load the database dump, and | ||
start the local IYP instance. This initial setup needs be done only once. It won't work | ||
if this directory already contains a database. | ||
|
||
This setup keeps the database instance running in the foreground. It can be stopped with | ||
`Ctrl+C`. Afterwards, you can simply [start/stop](#startstop-iyp) IYP in the background | ||
to use it. To update the database with a new dump see [Update existing | ||
database](documentation/database-management.md#update-existing-database). | ||
|
||
### Start/Stop IYP | ||
To stop the database, run the following command: | ||
``` | ||
docker stop iyp | ||
``` | ||
|
||
To restart the database, run the following command: | ||
``` | ||
To start the database, run the following command: | ||
|
||
```bash | ||
docker start iyp | ||
``` | ||
|
||
To stop the database, run the following command: | ||
|
||
``` bash | ||
docker stop iyp | ||
``` | ||
|
||
### Querying the database | ||
### Query the database | ||
|
||
Open http://localhost:7474 in your favorite browser. To connect the interface to the database give | ||
Open <http://localhost:7474> in your favorite browser. To connect the interface to the database give | ||
the default login and password: `neo4j` and `password` respectively. Then enter your query in the top input field. | ||
|
||
For example, this finds the IXPs and corresponding country codes where IIJ (AS2497) is: | ||
|
||
```cypher | ||
MATCH (iij:AS {asn:2497})-[:MEMBER_OF]-(ix:IXP)--(cc:Country) | ||
RETURN iij, ix, cc | ||
``` | ||
|
||
![Countries of IXPs where AS2497 is present](/documentation/assets/gallery/as2497ixpCountry.svg) | ||
|
||
### IYP gallery | ||
|
||
See more query examples in [IYP gallery](/documentation/gallery.md) | ||
|
||
### Save modified database | ||
## Contributing | ||
|
||
If you modify the database and want to make a new dump, use the following command. Run the following command for updating an existing database. **Note: This command writes the dump to `backups/neo4j.dump` and overwrites this file if it exists.** | ||
``` | ||
docker compose run -it iyp_loader neo4j-admin database dump neo4j --to-path=/backups --verbose --overwrite-destination | ||
``` | ||
|
||
### Updating an existing database | ||
|
||
To update the database with a new dump remove the existing `data` directory and | ||
reload a dump with the following commands: | ||
``` | ||
docker stop iyp | ||
sudo rm -rf data | ||
docker start iyp_loader -i | ||
``` | ||
|
||
### Viewing Neo4j logs | ||
To view the logs of the Neo4j container, use the following command: | ||
``` | ||
docker compose logs -f iyp | ||
``` | ||
|
||
|
||
## Creating a new dump from scratch | ||
|
||
Clone this repository. | ||
``` | ||
git clone https://github.com/InternetHealthReport/internet-yellow-pages.git | ||
cd internet-yellow-pages | ||
``` | ||
|
||
Create python environment and install python libraries: | ||
``` | ||
python3 -m venv . | ||
source bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
Configuration file, rename example file and add API keys: | ||
``` | ||
cp config.json.example config.json | ||
# Edit as needed | ||
``` | ||
|
||
Create and populate a new database: | ||
``` | ||
python3 create_db.py | ||
``` | ||
This will take a couple of hours to download all datasets and push them to neo4j. | ||
Want to [propose a new dataset](documentation/README.md#add-new-datasets) or [implement | ||
a crawler](documentation/writing-a-crawler.md)? Checkout the | ||
[documentation](documentation/README.md) for more info. | ||
|
||
## Changelog | ||
|
||
See: https://github.com/InternetHealthReport/internet-yellow-pages/releases | ||
See: <https://github.com/InternetHealthReport/internet-yellow-pages/releases> | ||
|
||
## External links | ||
- Public instance of IYP: https://iyp.iijlab.net | ||
- RIPE86 presentation: https://ripe86.ripe.net/archives/video/1073/ | ||
- APNIC blog article: https://blog.apnic.net/2023/09/06/understanding-the-japanese-internet-with-the-internet-yellow-pages/ | ||
|
||
- [Public instance of IYP](https://iyp.iijlab.net) | ||
- [RIPE86 presentation](https://ripe86.ripe.net/archives/video/1073/) | ||
- [APNIC blog article](https://blog.apnic.net/2023/09/06/understanding-the-japanese-internet-with-the-internet-yellow-pages/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,48 +1,39 @@ | ||
# IYP Documentation | ||
# IYP documentation | ||
|
||
## IYP Ontology | ||
## Ontology | ||
|
||
The list of node and relationship types defined for IYP are available at: | ||
- [Node types](./node_types.md) | ||
- [Relationship types](./relationship_types.md) | ||
|
||
## IYP Data Sources | ||
- [Node types](./node-types.md) | ||
- [Relationship types](./relationship-types.md) | ||
|
||
## Data sources | ||
|
||
The list of all datasets imported in IYP is available [here](data-sources.md). | ||
The datasets licence are available the [IYP acknowledgments](../ACKNOWLEDGMENTS.md). | ||
|
||
## IYP Gallery | ||
|
||
The [IYP gallery](./gallery.md) provides example queries to help user browse the database. | ||
|
||
## Importing a new dataset | ||
### Python crawler | ||
To import a new dataset in IYP, you should write a crawler for that dataset. | ||
The main tasks of a crawler are to fetch data, parse it, model it with IYP | ||
ontology, and push it to the IYP database. Most of these tasks are assisted by | ||
the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started. | ||
See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128). | ||
|
||
### README | ||
Each crawler should be accompanied by a README.md file. This is the main documentation | ||
for the crawler, it should contain: | ||
- a short description of the dataset, | ||
- any specificities related to the way the data is imported (e.g. time span, data cleaning), | ||
- examples of how the data is modeled, | ||
- dependencies to other crawlers (e.g. if the crawler requires data from another one). | ||
|
||
### Adding a crawler to IYP main branch | ||
If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls) | ||
to include the crawler to IYP's github repository main branch. | ||
|
||
Along with the python code and README, the addition of new datasets should also | ||
be reflected in the following files: | ||
- the list of [imported datasets](./data-sources.md), | ||
- the [IYP acknowledgments](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset. | ||
|
||
Changes to the ontology should be discussed in advance, either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:[email protected]), | ||
so that a consensus is reached before the ontology is updated. | ||
**Any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)). | ||
|
||
You can also consider adding example queries to the [IYP gallery](./gallery.md), | ||
and organizations providing data to the [IYP frontpage](). | ||
The dataset licenses are available the [acknowledgments](../ACKNOWLEDGMENTS.md). | ||
|
||
## Gallery | ||
|
||
The [IYP gallery](./gallery.md) provides example queries to help users browse the | ||
database. | ||
|
||
## Add new datasets | ||
|
||
### Propose a new dataset | ||
|
||
Have an idea for a dataset that should be integrated into IYP? Feel free to propose it | ||
by opening a new [discussion](). You should describe the dataset, why it is potentially | ||
useful, and, if possible, provide some initial idea for modeling the data. | ||
|
||
The discussion is used to decide if we want to integrate the dataset and how to model | ||
it. So feel free to propose a dataset even if you have no concrete model in mind. | ||
|
||
### Import a new dataset | ||
|
||
If it was decided that the dataset should be integrated into IYP, we will convert the | ||
discussion into a [GitHub issue](). At this stage it is open to anyone who wants to | ||
implement a crawler for the dataset. | ||
|
||
For a detailed description on how to write your first crawler and contribute to IYP take | ||
a look at the [IHR contributing guidelines](../CONTRIBUTING.md) and the [crawler | ||
instructions](writing-a-crawler.md). |
Oops, something went wrong.