Rework readmes and documentation

InternetHealthReport · Nov 2, 2024 · d47d1b5 · d47d1b5
1 parent 9348f3a
commit d47d1b5
Show file tree

Hide file tree

Showing 10 changed files with 371 additions and 186 deletions.
diff --git a/README.md b/README.md
@@ -1,152 +1,111 @@
 # Internet Yellow Pages
 
-The Internet Yellow Pages (IYP) is a knowledge database that gathers information about Internet resources (for example ASNs, IP prefixes, and domain names). 
+The Internet Yellow Pages (IYP) is a knowledge database that gathers information about
+Internet resources (for example ASNs, IP prefixes, and domain names).
 
 ## Public IYP prototype
 
-Visit http://iyp.iijlab.net to try our online prototype. No password is required, just click the 'connect' button to get started. Don't know how to use IYP ? You'll find a guide after clicking the 'connect' button, see also examples [here](https://github.com/InternetHealthReport/internet-yellow-pages/blob/main/documentation/gallery.md).
+Visit <https://iyp.iijlab.net> to try our online prototype. You will find instructions
+on how to connect to the prototype and some example queries there. For even more
+examples, check out the [IYP
+gallery](documentation/gallery.md).
 
-## Deploying a local IYP instance
+## Deploy a local IYP instance
+
+We describe the basic process of deploying a local IYP instance below. For more advanced
+commands see the [database documentation](documentation/database-management.md).
 
 ### Prerequisites
+
 - [Curl](https://curl.se/download.html)
 - [Docker](https://www.docker.com/)
 - [Docker Compose](https://docs.docker.com/compose/install/)
-- about 30GB of free disk space
+- about 50GB of free disk space
 
-### Downloading the Database dump
+### Download the database dump
 
-#### Explore and Download Dumps
-
-Visit the database dumps repository at:
-```
-https://ihr-archive.iijlab.net/ihr/iyp/
-```
-
-#### Specific Dump Format
+Visit the [database dump repository](https://ihr-archive.iijlab.net/ihr/iyp/).
 
 Dumps are organized by year, month, and day in this format:
-```
+
+```text
 https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump
 ```
 
-Replace `YYYY`, `MM`, and `DD` in the URL with the desired date to access a specific database dump.
-
-#### Download Instructions
-
-1. **Create a Directory:**
+Replace `YYYY`, `MM`, and `DD` in the URL with the desired date to access a specific
+database dump.
 
-   Execute the following command to create a `dumps` directory in your current working directory:
-   ```
-   mkdir dumps
-   ```
+The dump file needs to be called `neo4j.dump` and needs to be put in a folder called
+`dumps` (`dumps/neo4j.dump`).
+To create the folder and download a dump with `curl`:
 
-2. **Download the Database Dump:**
-
-   Use `curl` to download the database dump and save it in the `dumps/neo4j.dump` path:
-   ```
-   curl https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump -o dumps/neo4j.dump
-   ```
+```bash
+mkdir dumps
+curl https://ihr-archive.iijlab.net/ihr/iyp/YYYY/MM/DD/iyp-YYYY-MM-DD.dump -o dumps/neo4j.dump
+```
 
-Remember to replace `YYYY`, `MM`, and `DD` in the download command with the specific date you require.
+### Set up IYP
 
-### Setting up IYP
 To uncompress the dump and start the database run the following command:
+
+```bash
+mkdir -p data
+UID="$(id -u)" GID="$(id -g)" docker compose --profile local up
 ```
-docker compose --profile local up
-```
-This creates a `data` directory containing the database. 
-This initial setup needs be done only once. 
-It won't work if this directory already contains a database.
 
-Afterwards, you can simply [start/stop](#startstop-iyp) IYP to use it. 
-To update the database with a new dump see [Updating an existing database](#updating-an-existing-database).
+This creates a `data` directory containing the database, load the database dump, and
+start the local IYP instance. This initial setup needs be done only once. It won't work
+if this directory already contains a database.
 
+This setup keeps the database instance running in the foreground. It can be stopped with
+`Ctrl+C`. Afterwards, you can simply [start/stop](#startstop-iyp) IYP in the background
+to use it. To update the database with a new dump see [Update existing
+database](documentation/database-management.md#update-existing-database).
 
 ### Start/Stop IYP
-To stop the database, run the following command:
-```
-docker stop iyp
-```
 
-To restart the database, run the following command:
-```
+To start the database, run the following command:
+
+```bash
 docker start iyp
 ```
 
+To stop the database, run the following command:
+
+``` bash
+docker stop iyp
+```
 
-### Querying the database
+### Query the database
 
-Open http://localhost:7474 in your favorite browser. To connect the interface to the database give
+Open <http://localhost:7474> in your favorite browser. To connect the interface to the database give
 the default login and password: `neo4j` and `password` respectively. Then enter your query in the top input field.
 
 For example, this finds the IXPs and corresponding country codes where IIJ (AS2497) is:
+
 ```cypher
 MATCH (iij:AS {asn:2497})-[:MEMBER_OF]-(ix:IXP)--(cc:Country)
 RETURN iij, ix, cc
 ```
+
 ![Countries of IXPs where AS2497 is present](/documentation/assets/gallery/as2497ixpCountry.svg)
 
 ### IYP gallery
 
 See more query examples in [IYP gallery](/documentation/gallery.md)
 
-### Save modified database
+## Contributing
 
-If you modify the database and want to make a new dump, use the following command. Run the following command for updating an existing database. **Note: This command writes the dump to `backups/neo4j.dump` and overwrites this file if it exists.** 
-```
-docker compose run -it iyp_loader neo4j-admin database dump neo4j --to-path=/backups --verbose --overwrite-destination
-```
-
-### Updating an existing database
-
-To update the database with a new dump remove the existing `data` directory and 
-reload a dump with the following commands:
-```
-docker stop iyp
-sudo rm -rf data
-docker start iyp_loader -i
-```
-
-### Viewing Neo4j logs
-To view the logs of the Neo4j container, use the following command:
-```
-docker compose logs -f iyp
-```
-
-
-## Creating a new dump from scratch
-
-Clone this repository.
-```
-git clone https://github.com/InternetHealthReport/internet-yellow-pages.git
-cd internet-yellow-pages
-```
-
-Create python environment and install python libraries:
-```
-python3 -m venv .
-source bin/activate
-pip install -r requirements.txt
-```
-
-Configuration file, rename example file and add API keys:
-```
-cp config.json.example config.json
-# Edit as needed
-```
-
-Create and populate a new database:
-```
-python3 create_db.py
-```
-This will take a couple of hours to download all datasets and push them to neo4j.
+Want to [propose a new dataset](documentation/README.md#add-new-datasets) or [implement
+a crawler](documentation/writing-a-crawler.md)? Checkout the
+[documentation](documentation/README.md) for more info.
 
 ## Changelog
 
-See: https://github.com/InternetHealthReport/internet-yellow-pages/releases
+See: <https://github.com/InternetHealthReport/internet-yellow-pages/releases>
 
 ## External links
-- Public instance of IYP: https://iyp.iijlab.net
-- RIPE86 presentation: https://ripe86.ripe.net/archives/video/1073/
-- APNIC blog article: https://blog.apnic.net/2023/09/06/understanding-the-japanese-internet-with-the-internet-yellow-pages/
+
+- [Public instance of IYP](https://iyp.iijlab.net)
+- [RIPE86 presentation](https://ripe86.ripe.net/archives/video/1073/)
+- [APNIC blog article](https://blog.apnic.net/2023/09/06/understanding-the-japanese-internet-with-the-internet-yellow-pages/)
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -1,22 +1,20 @@
-volumes:
-  caddy_data:
-  caddy_config:
 services:
   iyp_loader:
     image: neo4j/neo4j-admin:5.21.2
     profiles: ["local", "public_tls", "public_notls"]
+    user: "${UID}:${GID}"
     container_name: iyp_loader
     tty: true
     stdin_open: true
     volumes:
       - ./data:/data
       - ./dumps:/dumps
-      - ./backups:/backups
     command: neo4j-admin database load neo4j --from-path=/dumps --verbose
 
   iyp:
     image: neo4j:5.21.2
     profiles: ["local"]
+    user: "${UID}:${GID}"
     container_name: iyp
     restart: unless-stopped
     ports:
@@ -33,6 +31,7 @@ services:
   iyp_readonly_tls:
     image: neo4j:5.21.2
     profiles: ["public_tls"]
+    user: "${UID}:${GID}"
     container_name: iyp
     restart: unless-stopped
     ports:
@@ -52,6 +51,7 @@ services:
   iyp_readonly_notls:
     image: neo4j:5.21.2
     profiles: ["public_notls"]
+    user: "${UID}:${GID}"
     container_name: iyp
     restart: unless-stopped
     ports:
@@ -69,12 +69,13 @@ services:
 
   caddy:
     image: caddy:latest
+    profiles: ["caddy"]
+    user: "${UID}:${GID}"
     container_name: caddy
     restart: unless-stopped
     ports:
       - "80:80"
       - "443:443"
-      - "443:443/udp"
       - "2019:2019"
     environment:
       - CADDY_ADMIN=0.0.0.0:2019
@@ -83,4 +84,7 @@ services:
       - caddy_data:/data
       - caddy_config:/config
     command: /usr/bin/caddy run --resume
-
+
+volumes:
+  caddy_data:
+  caddy_config:
diff --git a/documentation/README.md b/documentation/README.md
@@ -1,48 +1,39 @@
-# IYP Documentation
+# IYP documentation
 
-## IYP Ontology
+## Ontology
 
 The list of node and relationship types defined for IYP are available at:
-- [Node types](./node_types.md)
-- [Relationship types](./relationship_types.md)
 
-## IYP Data Sources
+- [Node types](./node-types.md)
+- [Relationship types](./relationship-types.md)
+
+## Data sources
 
 The list of all datasets imported in IYP is available [here](data-sources.md).
-The datasets licence are available the [IYP acknowledgments](../ACKNOWLEDGMENTS.md).
-
-## IYP Gallery
-
-The [IYP gallery](./gallery.md) provides example queries to help user browse the database.
-
-## Importing a new dataset
-### Python crawler
-To import a new dataset in IYP, you should write a crawler for that dataset.
-The main tasks of a crawler are to fetch data, parse it, model it with IYP
-ontology, and push it to the IYP database. Most of these tasks are assisted by
-the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started.
-See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128). 
-
-### README
-Each crawler should be accompanied by a README.md file. This is the main documentation
-for the crawler, it should contain:
-- a short description of the dataset, 
-- any specificities related to the way the data is imported (e.g. time span, data cleaning), 
-- examples of how the data is modeled,
-- dependencies to other crawlers (e.g. if the crawler requires data from another one).
-
-### Adding a crawler to IYP main branch
-If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls)
-to include the crawler to IYP's github repository main branch. 
-
-Along with the python code and README, the addition of new datasets should also 
-be reflected in the following files:
-- the list of [imported datasets](./data-sources.md),
-- the [IYP acknowledgments](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset.
-
-Changes to the ontology should be discussed in advance, either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:[email protected]),
-so that a consensus is reached before the ontology is updated. 
-**Any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)).
-
-You can also consider adding example queries to the [IYP gallery](./gallery.md),
-and organizations providing data to the [IYP frontpage]().
+The dataset licenses are available the [acknowledgments](../ACKNOWLEDGMENTS.md).
+
+## Gallery
+
+The [IYP gallery](./gallery.md) provides example queries to help users browse the
+database.
+
+## Add new datasets
+
+### Propose a new dataset
+
+Have an idea for a dataset that should be integrated into IYP? Feel free to propose it
+by opening a new [discussion](). You should describe the dataset, why it is potentially
+useful, and, if possible, provide some initial idea for modeling the data.
+
+The discussion is used to decide if we want to integrate the dataset and how to model
+it. So feel free to propose a dataset even if you have no concrete model in mind.
+
+### Import a new dataset
+
+If it was decided that the dataset should be integrated into IYP, we will convert the
+discussion into a [GitHub issue](). At this stage it is open to anyone who wants to
+implement a crawler for the dataset.
+
+For a detailed description on how to write your first crawler and contribute to IYP take
+a look at the [IHR contributing guidelines](../CONTRIBUTING.md) and the [crawler
+instructions](writing-a-crawler.md).