docs: use codeblocks tabs for the different languages

biopragmatics · Apr 30, 2024 · a8e9dd0 · a8e9dd0
1 parent b663c97
commit a8e9dd0
Show file tree

Hide file tree

Showing 11 changed files with 1,221 additions and 170 deletions.
diff --git a/lib/docs/docs/contributing.md b/lib/docs/docs/contributing.md
@@ -162,14 +162,16 @@ cargo outdated
 
 ## 🏷️ Publish a new release
 
-Building and publishing artifacts will be done by the [`build.yml`](https://github.com/biopragmatics/curies.rs/actions/workflows/build.yml) GitHub actions workflow, make sure you have set the following tokens as secrets on GitHub for this repository: `PYPI_TOKEN`, `NPM_TOKEN`, `CRATES_IO_TOKEN`, `CODECOV_TOKEN`
+!!! success "Automated release"
+
+    Building and publishing artifacts (binaries, pip wheels, npm package) will be done automatically by the [`.github/workflows/build.yml`](https://github.com/biopragmatics/curies.rs/actions/workflows/build.yml) GitHub action when you push a new tag.
+
+!!! warning "Set secrets for the GitHub repository"
+
+    Make sure you have set the following tokens as secrets on GitHub for this repository: `PYPI_TOKEN`, `NPM_TOKEN`, `CRATES_IO_TOKEN`, `CODECOV_TOKEN`
 
-To release a new version, run the release script providing the new version following [semantic versioning](https://semver.org), it will bump the version in the `Cargo.toml` files, generate the changelog from commit messages, create a new tag, and push to GitHub:
+To release a new version, run the release script providing the new version following [semantic versioning](https://semver.org), it will bump the version in the `Cargo.toml` files, generate the changelog from commit messages, create a new tag, and push to GitHub; the workflow will do the rest:
 
 ```bash
 ./scripts/release.sh 0.1.2
 ```
-
-!!! success "Automated release"
-
-    The `build.yml` workflow will automatically build artifacts (binaries, pip wheels, npm package), create a new release on GitHub, and add the generated artifacts to the new release.
diff --git a/lib/docs/docs/devtools.md b/lib/docs/docs/devtools.md
@@ -0,0 +1,267 @@
+# 🧰 Tools for Developers and Semantic Engineers
+
+## 🪄 Working with strings that might be a URI or a CURIE
+
+Sometimes, it’s not clear if a string is a CURIE or a URI. While the [SafeCURIE syntax](https://www.w3.org/TR/2010/NOTE-curie-20101216/#P_safe_curie) is intended to address this, it’s often overlooked.
+
+### ☑️ CURIE and URI Checks
+
+The first way to handle this ambiguity is to be able to check if the string is a CURIE or a URI. Therefore, each `Converter` comes with functions for checking if a string is a CURIE (`converter.is_curie()`) or a URI (`converter.is_uri()`) under its definition.
+
+=== "Python"
+
+    ```python
+    from curies_rs import get_obo_converter
+
+    converter = get_obo_converter()
+
+    assert converter.is_curie("GO:1234567")
+    assert not converter.is_curie("http://purl.obolibrary.org/obo/GO_1234567")
+    # This is a valid CURIE, but not under this converter's definition
+    assert not converter.is_curie("pdb:2gc4")
+
+    assert converter.is_uri("http://purl.obolibrary.org/obo/GO_1234567")
+    assert not converter.is_uri("GO:1234567")
+    # This is a valid URI, but not under this converter's definition
+    assert not converter.is_uri("http://proteopedia.org/wiki/index.php/2gc4")
+    ```
+
+=== "JavaScript"
+
+    ```javascript
+    import {getOboConverter} from "@biopragmatics/curies";
+
+    async function main() {
+        const converter = await getOboConverter();
+
+        console.log(converter.isCurie("GO:1234567"))
+
+        console.log(converter.isUri("http://purl.obolibrary.org/obo/GO_1234567"))
+    }
+    main();
+    ```
+
+=== "Rust"
+
+    ```rust
+    use curies::sources::get_obo_converter;
+
+    #[tokio::main]
+    async fn main() -> Result<(), Box<dyn std::error::Error>> {
+        let converter = get_obo_converter().await?;
+
+        assert_eq!(converter.is_curie("GO:1234567"), true);
+
+        assert_eq!(converter.is_uri("http://purl.obolibrary.org/obo/GO_1234567"), true);
+        Ok(())
+    }
+    ```
+
+### 🗜️ Standardized Expansion and Compression
+
+The `converter.expand_or_standardize()` function extends the CURIE expansion function to handle the situation where you might get passed a CURIE or a URI. If it’s a CURIE, expansions happen with the normal rules. If it’s a URI, it tries to standardize it.
+
+=== "Python"
+
+    ```python
+    from curies_rs import Converter
+
+    converter = Converter.from_extended_prefix_map("""[{
+        "prefix": "CHEBI",
+        "prefix_synonyms": ["chebi"],
+        "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
+        "uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
+    }]""")
+
+    # Expand CURIEs
+    assert converter.expand_or_standardize("CHEBI:138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
+    assert converter.expand_or_standardize("chebi:138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
+
+    # standardize URIs
+    assert converter.expand_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
+    assert converter.expand_or_standardize("https://identifiers.org/chebi:138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
+
+    # Handle cases that aren't valid w.r.t. the converter
+    try:
+        converter.expand_or_standardize("missing:0000000")
+        converter.expand_or_standardize("https://example.com/missing:0000000")
+    except Exception as e:
+        print(e)
+    ```
+
+=== "JavaScript"
+
+    ```javascript
+    import {Converter} from "@biopragmatics/curies";
+
+    async function main() {
+        const converter = await Converter.fromExtendedPrefixMap(`[{
+            "prefix": "CHEBI",
+            "prefix_synonyms": ["chebi"],
+            "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
+            "uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
+        }]`)
+
+        console.log(converter.expandOrStandardize("chebi:138488"))
+        console.log(converter.expandOrStandardize("https://identifiers.org/chebi:138488"))
+        try {
+            console.log(converter.expandOrStandardize("http://purl.obolibrary.org/UNKNOWN_12345"))
+        } catch (e) {
+            console.log("Failed successfully)
+        }
+    }
+    main();
+    ```
+
+=== "Rust"
+
+    ```rust
+    use curies::Converter;
+
+    #[tokio::main]
+    async fn main() -> Result<(), Box<dyn std::error::Error>> {
+        let converter = Converter::from_extended_prefix_map(r#"[{
+            "prefix": "CHEBI",
+            "prefix_synonyms": ["chebi"],
+            "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
+            "uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
+        }]"#).await?;
+
+        assert_eq!(converter.expand_or_standardize("http://amigo.geneontology.org/amigo/term/GO:0032571").unwrap(), "http://purl.obolibrary.org/obo/GO_0032571".to_string());
+        assert_eq!(converter.expand_or_standardize("gomf:0032571").unwrap(), "http://purl.obolibrary.org/obo/GO_0032571".to_string());
+        assert!(converter.expand_or_standardize("http://purl.obolibrary.org/UNKNOWN_12345").is_err());
+        Ok(())
+    }
+    ```
+
+A similar workflow is implemented in `converter.compress_or_standardize()` for compressing URIs where a CURIE might get passed.
+
+=== "Python"
+
+    ```python
+    from curies_rs import Converter
+
+    converter = Converter.from_extended_prefix_map("""[{
+        "prefix": "CHEBI",
+        "prefix_synonyms": ["chebi"],
+        "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
+        "uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
+    }]""")
+
+    # Compress URIs
+    assert converter.compress_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488") == 'CHEBI:138488'
+    assert converter.compress_or_standardize("https://identifiers.org/chebi:138488") == 'CHEBI:138488'
+
+    # standardize CURIEs
+    assert converter.compress_or_standardize("CHEBI:138488") == 'CHEBI:138488'
+    assert converter.compress_or_standardize("chebi:138488") == 'CHEBI:138488'
+
+    # Handle cases that aren't valid w.r.t. the converter
+    try:
+        converter.compress_or_standardize("missing:0000000")
+        converter.compress_or_standardize("https://example.com/missing:0000000")
+    except Exception as e:
+        print(e)
+        print(type(e))
+    ```
+
+=== "JavaScript"
+
+    ```javascript
+    import {Converter} from "@biopragmatics/curies";
+
+    async function main() {
+        const converter = await Converter.fromExtendedPrefixMap(`[{
+            "prefix": "CHEBI",
+            "prefix_synonyms": ["chebi"],
+            "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
+            "uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
+        }]`)
+
+        console.log(converter.compressOrStandardize("https://identifiers.org/chebi:138488"))
+        console.log(converter.compressOrStandardize("gomf:0032571"))
+        try {
+            console.log(converter.compressOrStandardize("http://purl.obolibrary.org/UNKNOWN_12345"))
+        } catch (e) {
+            console.log("Failed successfully)
+        }
+    }
+    main();
+    ```
+
+=== "Rust"
+
+    ```rust
+    use curies::Converter;
+
+    #[tokio::main]
+    async fn main() -> Result<(), Box<dyn std::error::Error>> {
+        let converter = Converter::from_extended_prefix_map(r#"[{
+            "prefix": "CHEBI",
+            "prefix_synonyms": ["chebi"],
+            "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
+            "uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
+        }]"#).await?;
+
+        assert_eq!(converter.compress_or_standardize("http://amigo.geneontology.org/amigo/term/GO:0032571").unwrap(), "go:0032571".to_string());
+        assert_eq!(converter.compress_or_standardize("gomf:0032571").unwrap(), "go:0032571".to_string());
+        assert!(converter.compress_or_standardize("http://purl.obolibrary.org/UNKNOWN_12345").is_err());
+        Ok(())
+    }
+    ```
+
+## 🚚 Bulk operations
+
+You can use the `expand_list()` and `compress_list()` functions to processes many URIs or CURIEs at once..
+
+For example to create a new `URI` column in a pandas dataframe from a `CURIE` column:
+
+```python
+import pandas as pd
+from curies_rs import get_bioregistry_converter
+
+converter = get_bioregistry_converter()
+df = pd.DataFrame({'CURIE': ['doid:1234', 'doid:5678', 'doid:91011']})
+
+# Expand the list of CURIEs to URIs
+df['URI'] = converter.expand_list(df['CURIE'])
+print(df)
+```
+
+## 🧩 Integrating with [`rdflib`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#module-rdflib)
+
+RDFlib is a pure Python package for manipulating RDF data. The following example shows how to bind the extended prefix map from a `Converter` to a graph ([`rdflib.Graph`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.Graph)).
+
+```python
+import curies_rs, rdflib, rdflib.namespace, json
+
+converter = curies_rs.get_obo_converter()
+g = rdflib.Graph()
+
+for prefix, uri_prefix in json.loads(converter.write_prefix_map()).items():
+    g.bind(prefix, rdflib.Namespace(uri_prefix))
+```
+
+A more flexible approach is to instantiate a namespace manager ([`rdflib.namespace.NamespaceManager`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.namespace.html#rdflib.namespace.NamespaceManager)) and bind directly to that.
+
+```python
+import curies_rs, rdflib, json
+
+converter = curies_rs.get_obo_converter()
+namespace_manager = rdflib.namespace.NamespaceManager(rdflib.Graph())
+
+for prefix, uri_prefix in json.loads(converter.write_prefix_map()).items():
+    namespace_manager.bind(prefix, rdflib.Namespace(uri_prefix))
+```
+
+URI references for use in RDFLib’s graph class can be constructed from CURIEs using a combination of `converter.expand()` and [`rdflib.URIRef`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.URIRef).
+
+```python
+import curies_rs, rdflib
+
+converter = curies_rs.get_obo_converter()
+
+uri_ref = rdflib.URIRef(converter.expand("CHEBI:138488"))
+```
+
+<!-- TODO: Reusable data structures for references? -->