diff --git a/documentation/README.md b/documentation/README.md new file mode 100644 index 0000000..cfb767d --- /dev/null +++ b/documentation/README.md @@ -0,0 +1,48 @@ +# IYP Documentation + +## IYP Ontology + +The list of node and relationship types defined for IYP are available at: +- [Node types](./node_types.md) +- [Relationship types](./relationship_types.md) + +## IYP Data Sources + +The list of all datasets imported in IYP is available [here](data-sources.md). +The datasets licences are available the [ACKNOWLEDGMENTS file](../ACKNOWLEDGMENTS.md). + +## IYP Gallery + +The [IYP gallery](./gallery.md) provides example queries to help user browse the database. + +## Importing a new dataset +### Python crawler +To import a new dataset in IYP, you should write a crawler for that dataset. +The main tasks of a crawler are to fetch data, parse it, model it with IYP +ontology, and push it to the IYP database. Most of these tasks are assisted by +the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started. +See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128). + +### README +Each crawler should be accompanied by a README.md file. This is the main documentation +for the crawler, it should contain: +- a short description of the dataset, +- any specificities related to the way the data is imported (e.g. time span, data cleaning), +- examples of how the data is modeled, +- dependencies to other crawlers (e.g. if the crawler requires data from another one). + +### Adding a crawler to IYP main branch +If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls) +to include the crawler to IYP's github repository main branch. + +Along with the python code and README, the addition of new datasets should also +be reflected in the following files: +- The list of [imported datasets](./data-sources.md). +- The [ACKNOWLEDGMENTS.md](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset. + +Furthermore, **any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)). +Changes to the ontology should be discussed in advance so that a consensus is +reached before the ontology is updated either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:iyp@ihr.live). + +You can also consider adding example queries to the [IYP gallery](./gallery.md), +and organizations providing data to the [IYP frontpage]().