-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
adfc33f
commit 1298c5e
Showing
1 changed file
with
48 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# IYP Documentation | ||
|
||
## IYP Ontology | ||
|
||
The list of node and relationship types defined for IYP are available at: | ||
- [Node types](./node_types.md) | ||
- [Relationship types](./relationship_types.md) | ||
|
||
## IYP Data Sources | ||
|
||
The list of all datasets imported in IYP is available [here](data-sources.md). | ||
The datasets licences are available the [ACKNOWLEDGMENTS file](../ACKNOWLEDGMENTS.md). | ||
|
||
## IYP Gallery | ||
|
||
The [IYP gallery](./gallery.md) provides example queries to help user browse the database. | ||
|
||
## Importing a new dataset | ||
### Python crawler | ||
To import a new dataset in IYP, you should write a crawler for that dataset. | ||
The main tasks of a crawler are to fetch data, parse it, model it with IYP | ||
ontology, and push it to the IYP database. Most of these tasks are assisted by | ||
the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started. | ||
See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128). | ||
|
||
### README | ||
Each crawler should be accompanied by a README.md file. This is the main documentation | ||
for the crawler, it should contain: | ||
- a short description of the dataset, | ||
- any specificities related to the way the data is imported (e.g. time span, data cleaning), | ||
- examples of how the data is modeled, | ||
- dependencies to other crawlers (e.g. if the crawler requires data from another one). | ||
|
||
### Adding a crawler to IYP main branch | ||
If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls) | ||
to include the crawler to IYP's github repository main branch. | ||
|
||
Along with the python code and README, the addition of new datasets should also | ||
be reflected in the following files: | ||
- The list of [imported datasets](./data-sources.md). | ||
- The [ACKNOWLEDGMENTS.md](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset. | ||
|
||
Furthermore, **any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)). | ||
Changes to the ontology should be discussed in advance so that a consensus is | ||
reached before the ontology is updated either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:[email protected]). | ||
|
||
You can also consider adding example queries to the [IYP gallery](./gallery.md), | ||
and organizations providing data to the [IYP frontpage](). |