First draft of documentation readme

InternetHealthReport · Aug 15, 2024 · 1298c5e · 1298c5e
1 parent adfc33f
commit 1298c5e
Showing 1 changed file with 48 additions and 0 deletions.
diff --git a/documentation/README.md b/documentation/README.md
@@ -0,0 +1,48 @@
+# IYP Documentation
+
+## IYP Ontology
+
+The list of node and relationship types defined for IYP are available at:
+- [Node types](./node_types.md)
+- [Relationship types](./relationship_types.md)
+
+## IYP Data Sources
+
+The list of all datasets imported in IYP is available [here](data-sources.md).
+The datasets licences are available the [ACKNOWLEDGMENTS file](../ACKNOWLEDGMENTS.md).
+
+## IYP Gallery
+
+The [IYP gallery](./gallery.md) provides example queries to help user browse the database.
+
+## Importing a new dataset
+### Python crawler
+To import a new dataset in IYP, you should write a crawler for that dataset.
+The main tasks of a crawler are to fetch data, parse it, model it with IYP
+ontology, and push it to the IYP database. Most of these tasks are assisted by
+the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started.
+See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128). 
+
+### README
+Each crawler should be accompanied by a README.md file. This is the main documentation
+for the crawler, it should contain:
+- a short description of the dataset, 
+- any specificities related to the way the data is imported (e.g. time span, data cleaning), 
+- examples of how the data is modeled,
+- dependencies to other crawlers (e.g. if the crawler requires data from another one).
+
+### Adding a crawler to IYP main branch
+If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls)
+to include the crawler to IYP's github repository main branch. 
+
+Along with the python code and README, the addition of new datasets should also 
+be reflected in the following files:
+- The list of [imported datasets](./data-sources.md).
+- The [ACKNOWLEDGMENTS.md](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset.
+
+Furthermore, **any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)).
+Changes to the ontology should be discussed in advance so that a consensus is
+reached before the ontology is updated either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:[email protected]).
+
+You can also consider adding example queries to the [IYP gallery](./gallery.md),
+and organizations providing data to the [IYP frontpage]().