Skip to content

Commit

Permalink
First draft of documentation readme
Browse files Browse the repository at this point in the history
  • Loading branch information
romain-fontugne committed Aug 15, 2024
1 parent adfc33f commit 1298c5e
Showing 1 changed file with 48 additions and 0 deletions.
48 changes: 48 additions & 0 deletions documentation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# IYP Documentation

## IYP Ontology

The list of node and relationship types defined for IYP are available at:
- [Node types](./node_types.md)
- [Relationship types](./relationship_types.md)

## IYP Data Sources

The list of all datasets imported in IYP is available [here](data-sources.md).
The datasets licences are available the [ACKNOWLEDGMENTS file](../ACKNOWLEDGMENTS.md).

## IYP Gallery

The [IYP gallery](./gallery.md) provides example queries to help user browse the database.

## Importing a new dataset
### Python crawler
To import a new dataset in IYP, you should write a crawler for that dataset.
The main tasks of a crawler are to fetch data, parse it, model it with IYP
ontology, and push it to the IYP database. Most of these tasks are assisted by
the [IYP python library](../iyp/__init__.py). See the [example crawler](../iyp/crawlers/example/crawler.py) or [existing crawlers](../iyp/crawlers/) for getting started.
See also the [IHR contributing guidelines](../CONTRIBUTING.md) and [best practices for writing crawlers](https://github.com/InternetHealthReport/internet-yellow-pages/discussions/128).

### README
Each crawler should be accompanied by a README.md file. This is the main documentation
for the crawler, it should contain:
- a short description of the dataset,
- any specificities related to the way the data is imported (e.g. time span, data cleaning),
- examples of how the data is modeled,
- dependencies to other crawlers (e.g. if the crawler requires data from another one).

### Adding a crawler to IYP main branch
If you wish your crawler to be part of the IYP weekly dumps, you can submit a [Pull Request](https://github.com/InternetHealthReport/internet-yellow-pages/pulls)
to include the crawler to IYP's github repository main branch.

Along with the python code and README, the addition of new datasets should also
be reflected in the following files:
- The list of [imported datasets](./data-sources.md).
- The [ACKNOWLEDGMENTS.md](../ACKNOWLEDGMENTS.md) file should list the licence of all imported dataset.

Furthermore, **any change to the ontology should be reflected in the documentation** ([Node types](./node_types.md) and [Relationship types](./relationship_types.md)).
Changes to the ontology should be discussed in advance so that a consensus is
reached before the ontology is updated either on [github discussion](https://github.com/InternetHealthReport/internet-yellow-pages/discussions) or by reaching [IYP maintainers](mailto:[email protected]).

You can also consider adding example queries to the [IYP gallery](./gallery.md),
and organizations providing data to the [IYP frontpage]().

0 comments on commit 1298c5e

Please sign in to comment.