Skip to content

Commit

Permalink
Rework readmes and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
m-appel committed Nov 2, 2024
1 parent 9348f3a commit d47d1b5
Show file tree
Hide file tree
Showing 10 changed files with 371 additions and 186 deletions.
157 changes: 58 additions & 99 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,152 +1,111 @@
# Internet Yellow Pages

The Internet Yellow Pages (IYP) is a knowledge database that gathers information about Internet resources (for example ASNs, IP prefixes, and domain names).
The Internet Yellow Pages (IYP) is a knowledge database that gathers information about
Internet resources (for example ASNs, IP prefixes, and domain names).

## Public IYP prototype

Visit http://iyp.iijlab.net to try our online prototype. No password is required, just click the 'connect' button to get started. Don't know how to use IYP ? You'll find a guide after clicking the 'connect' button, see also examples [here](https://github.com/InternetHealthReport/internet-yellow-pages/blob/main/documentation/gallery.md).
Visit <https://iyp.iijlab.net> to try our online prototype. You will find instructions
on how to connect to the prototype and some example queries there. For even more
examples, check out the [IYP
gallery](documentation/gallery.md).

## Deploying a local IYP instance
## Deploy a local IYP instance

We describe the basic process of deploying a local IYP instance below. For more advanced
commands see the [database documentation](documentation/database-management.md).

### Prerequisites

- [Curl](https://curl.se/download.html)
- [Docker](https://www.docker.com/)
- [Docker Compose](https://docs.docker.com/compose/install/)
- about 30GB of free disk space
- about 50GB of free disk space

### Downloading the Database dump
### Download the database dump

#### Explore and Download Dumps

Visit the database dumps repository at:
```
https://ihr-archive.iijlab.net/ihr/iyp/
```

#### Specific Dump Format
Visit the [database dump repository](https://ihr-archive.iijlab.net/ihr/iyp/).

Dumps are organized by year, month, and day in this format:
```

```text
https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump
```

Replace `YYYY`, `MM`, and `DD` in the URL with the desired date to access a specific database dump.

#### Download Instructions

1. **Create a Directory:**
Replace `YYYY`, `MM`, and `DD` in the URL with the desired date to access a specific
database dump.

Execute the following command to create a `dumps` directory in your current working directory:
```
mkdir dumps
```
The dump file needs to be called `neo4j.dump` and needs to be put in a folder called
`dumps` (`dumps/neo4j.dump`).
To create the folder and download a dump with `curl`:

2. **Download the Database Dump:**

Use `curl` to download the database dump and save it in the `dumps/neo4j.dump` path:
```
curl https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump -o dumps/neo4j.dump
```
```bash
mkdir dumps
curl https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump -o dumps/neo4j.dump
```

Remember to replace `YYYY`, `MM`, and `DD` in the download command with the specific date you require.
### Set up IYP

### Setting up IYP
To uncompress the dump and start the database run the following command:

```bash
mkdir -p data
UID="$(id -u)" GID="$(id -g)" docker compose --profile local up
```
docker compose --profile local up
```
This creates a `data` directory containing the database.
This initial setup needs be done only once.
It won't work if this directory already contains a database.

Afterwards, you can simply [start/stop](#startstop-iyp) IYP to use it.
To update the database with a new dump see [Updating an existing database](#updating-an-existing-database).
This creates a `data` directory containing the database, load the database dump, and
start the local IYP instance. This initial setup needs be done only once. It won't work
if this directory already contains a database.

This setup keeps the database instance running in the foreground. It can be stopped with
`Ctrl+C`. Afterwards, you can simply [start/stop](#startstop-iyp) IYP in the background
to use it. To update the database with a new dump see [Update existing
database](documentation/database-management.md#update-existing-database).

### Start/Stop IYP
To stop the database, run the following command:
```
docker stop iyp
```

To restart the database, run the following command:
```
To start the database, run the following command:

```bash
docker start iyp
```

To stop the database, run the following command:

``` bash
docker stop iyp
```

### Querying the database
### Query the database

Open http://localhost:7474 in your favorite browser. To connect the interface to the database give
Open <http://localhost:7474> in your favorite browser. To connect the interface to the database give
the default login and password: `neo4j` and `password` respectively. Then enter your query in the top input field.

For example, this finds the IXPs and corresponding country codes where IIJ (AS2497) is:

```cypher
MATCH (iij:AS {asn:2497})-[:MEMBER_OF]-(ix:IXP)--(cc:Country)
RETURN iij, ix, cc
```

![Countries of IXPs where AS2497 is present](/documentation/assets/gallery/as2497ixpCountry.svg)

### IYP gallery

See more query examples in [IYP gallery](/documentation/gallery.md)

### Save modified database
## Contributing

If you modify the database and want to make a new dump, use the following command. Run the following command for updating an existing database. **Note: This command writes the dump to `backups/neo4j.dump` and overwrites this file if it exists.**
```
docker compose run -it iyp_loader neo4j-admin database dump neo4j --to-path=/backups --verbose --overwrite-destination
```

### Updating an existing database

To update the database with a new dump remove the existing `data` directory and
reload a dump with the following commands:
```
docker stop iyp
sudo rm -rf data
docker start iyp_loader -i
```

### Viewing Neo4j logs
To view the logs of the Neo4j container, use the following command:
```
docker compose logs -f iyp
```


## Creating a new dump from scratch

Clone this repository.
```
git clone https://github.com/InternetHealthReport/internet-yellow-pages.git
cd internet-yellow-pages
```

Create python environment and install python libraries:
```
python3 -m venv .
source bin/activate
pip install -r requirements.txt
```

Configuration file, rename example file and add API keys:
```
cp config.json.example config.json
# Edit as needed
```

Create and populate a new database:
```
python3 create_db.py
```
This will take a couple of hours to download all datasets and push them to neo4j.
Want to [propose a new dataset](documentation/README.md#add-new-datasets) or [implement
a crawler](documentation/writing-a-crawler.md)? Checkout the
[documentation](documentation/README.md) for more info.

## Changelog

See: https://github.com/InternetHealthReport/internet-yellow-pages/releases
See: <https://github.com/InternetHealthReport/internet-yellow-pages/releases>

## External links
- Public instance of IYP: https://iyp.iijlab.net
- RIPE86 presentation: https://ripe86.ripe.net/archives/video/1073/
- APNIC blog article: https://blog.apnic.net/2023/09/06/understanding-the-japanese-internet-with-the-internet-yellow-pages/

- [Public instance of IYP](https://iyp.iijlab.net)
- [RIPE86 presentation](https://ripe86.ripe.net/archives/video/1073/)
- [APNIC blog article](https://blog.apnic.net/2023/09/06/understanding-the-japanese-internet-with-the-internet-yellow-pages/)
16 changes: 10 additions & 6 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
volumes:
caddy_data:
caddy_config:
services:
iyp_loader:
image: neo4j/neo4j-admin:5.21.2
profiles: ["local", "public_tls", "public_notls"]
user: "${UID}:${GID}"
container_name: iyp_loader
tty: true
stdin_open: true
volumes:
- ./data:/data
- ./dumps:/dumps
- ./backups:/backups
command: neo4j-admin database load neo4j --from-path=/dumps --verbose

iyp:
image: neo4j:5.21.2
profiles: ["local"]
user: "${UID}:${GID}"
container_name: iyp
restart: unless-stopped
ports:
Expand All @@ -33,6 +31,7 @@ services:
iyp_readonly_tls:
image: neo4j:5.21.2
profiles: ["public_tls"]
user: "${UID}:${GID}"
container_name: iyp
restart: unless-stopped
ports:
Expand All @@ -52,6 +51,7 @@ services:
iyp_readonly_notls:
image: neo4j:5.21.2
profiles: ["public_notls"]
user: "${UID}:${GID}"
container_name: iyp
restart: unless-stopped
ports:
Expand All @@ -69,12 +69,13 @@ services:

caddy:
image: caddy:latest
profiles: ["caddy"]
user: "${UID}:${GID}"
container_name: caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
- "443:443/udp"
- "2019:2019"
environment:
- CADDY_ADMIN=0.0.0.0:2019
Expand All @@ -83,4 +84,7 @@ services:
- caddy_data:/data
- caddy_config:/config
command: /usr/bin/caddy run --resume


volumes:
caddy_data:
caddy_config:
75 changes: 33 additions & 42 deletions documentation/README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,39 @@
# IYP Documentation
# IYP documentation

## IYP Ontology
## Ontology

The list of node and relationship types defined for IYP are available at:
- [Node types](./node_types.md)
- [Relationship types](./relationship_types.md)

## IYP Data Sources
- [Node types](./node-types.md)
- [Relationship types](./relationship-types.md)

## Data sources

The list of all datasets imported in IYP is available [here](data-sources.md).
The datasets licence are available the [IYP acknowledgments](../ACKNOWLEDGMENTS.md).

## IYP Gallery

The [IYP gallery](./gallery.md) provides example queries to help user browse the database.

## Importing a new dataset
### Python crawler
To import a new dataset in IYP, you should write a crawler for that dataset.
The main tasks of a crawler are to fetch data, parse it, model it with IYP
ontology, and push it to the IYP database. Most of these tasks are assisted by
the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started.
See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128).

### README
Each crawler should be accompanied by a README.md file. This is the main documentation
for the crawler, it should contain:
- a short description of the dataset,
- any specificities related to the way the data is imported (e.g. time span, data cleaning),
- examples of how the data is modeled,
- dependencies to other crawlers (e.g. if the crawler requires data from another one).

### Adding a crawler to IYP main branch
If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls)
to include the crawler to IYP's github repository main branch.

Along with the python code and README, the addition of new datasets should also
be reflected in the following files:
- the list of [imported datasets](./data-sources.md),
- the [IYP acknowledgments](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset.

Changes to the ontology should be discussed in advance, either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:[email protected]),
so that a consensus is reached before the ontology is updated.
**Any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)).

You can also consider adding example queries to the [IYP gallery](./gallery.md),
and organizations providing data to the [IYP frontpage]().
The dataset licenses are available the [acknowledgments](../ACKNOWLEDGMENTS.md).

## Gallery

The [IYP gallery](./gallery.md) provides example queries to help users browse the
database.

## Add new datasets

### Propose a new dataset

Have an idea for a dataset that should be integrated into IYP? Feel free to propose it
by opening a new [discussion](). You should describe the dataset, why it is potentially
useful, and, if possible, provide some initial idea for modeling the data.

The discussion is used to decide if we want to integrate the dataset and how to model
it. So feel free to propose a dataset even if you have no concrete model in mind.

### Import a new dataset

If it was decided that the dataset should be integrated into IYP, we will convert the
discussion into a [GitHub issue](). At this stage it is open to anyone who wants to
implement a crawler for the dataset.

For a detailed description on how to write your first crawler and contribute to IYP take
a look at the [IHR contributing guidelines](../CONTRIBUTING.md) and the [crawler
instructions](writing-a-crawler.md).
Loading

0 comments on commit d47d1b5

Please sign in to comment.