diff --git a/.Rbuildignore b/.Rbuildignore index aa57a16..1d57955 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -1,13 +1,16 @@ # checklist -^_pkgdown.yml$ ^.*\.Rproj$ ^.zenodo\.json$ +^CITATION\.cff$ +^LICENSE.md$ +^Meta$ +^README\.Rmd$ +^\.Rproj\.user$ ^\.github$ ^\.httr-oauth$ -^\.Rproj\.user$ ^\.zenodo\.json$ +^_pkgdown.yml$ ^checklist.yml$ -^CITATION\.cff$ ^codecov.yml$ ^codecov\.yml$ ^codemeta\.json$ @@ -15,8 +18,6 @@ ^data-raw$ ^doc$ ^docs$ -^LICENSE.md$ ^man-roxygen$ -^Meta$ +^organisation.yml$ ^pkgdown$ -^README\.Rmd$ diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md index 24aa0a3..3236635 100644 --- a/.github/CODE_OF_CONDUCT.md +++ b/.github/CODE_OF_CONDUCT.md @@ -8,7 +8,7 @@ We are committed to making participation in this project a harassment-free exper everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. -Examples of unacceptable behavior by participants include the use of sexual language or +Examples of unacceptable behaviour by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct. @@ -17,7 +17,7 @@ commits, code, wiki edits, issues, and other contributions that are not aligned Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team. -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by +Instances of abusive, harassing, or otherwise unacceptable behaviour may be reported by opening an issue or contacting one or more of the project maintainers. This Code of Conduct is adapted from the Contributor Covenant diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index bca3a7d..0eeb16c 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -2,55 +2,38 @@ ### Fixing typos -Small typos or grammatical errors in documentation may be edited directly using -the GitHub web interface, so long as the changes are made in the _source_ file. - -* YES: you edit a roxygen comment in a `.R` file below `R/`. -* NO: you edit an `.Rd` file below `man/`. +Small typos or grammatical errors in documentation may be edited directly using the GitHub web interface, so long as the changes are made in the _source_ file. +E.g. edit a `roxygen2` comment in a `.R` file below `R/`, not in an `.Rd` file below `man/`. ### Prerequisites -Before you make a substantial pull request, you should always file an issue and -make sure someone from the team agrees that it’s a problem. If you’ve found a -bug, create an associated issue and illustrate the bug with a minimal -[reprex](https://www.tidyverse.org/help/#reprex). +Before you make a substantial pull request, you should always file an issue and make sure someone from the team agrees that it’s a problem. +If you’ve found a bug, create an associated issue and illustrate the bug with a minimal [reproducible example](https://www.tidyverse.org/help/#reprex). ### Pull request process * We recommend that you create a Git branch for each pull request (PR). -* Look at the Travis and AppVeyor build status before and after making changes. -The `README` should contain badges for any continuous integration services used -by the package. -* We recommend the tidyverse [style guide](http://style.tidyverse.org). -You can use the [styler](https://CRAN.R-project.org/package=styler) package to -apply these styles, but please don't restyle code that has nothing to do with -your PR. -* We use [roxygen2](https://cran.r-project.org/package=roxygen2). -* We use [testthat](https://cran.r-project.org/package=testthat). Contributions -with test cases included are easier to accept. -* For user-facing changes, add a bullet to the top of `NEWS.md` below the -current development version header describing the changes made followed by your -GitHub username, and links to relevant issue(s)/PR(s). +* Look at the GitHub Actions build status before and after making changes. +The `README` should contain badges for any continuous integration services used by the package. +* We require the `tidyverse` [style guide](http://style.tidyverse.org). +You can use the [`styler`](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR. +* We use [`roxygen2`](https://cran.r-project.org/package=roxygen2). +* We use [`testthat`](https://cran.r-project.org/package=testthat). +Contributions with test cases included are easier to accept. +* For user-facing changes, add a bullet to the top of `NEWS.md` below the current development version header describing the changes made followed by your GitHub username, and links to relevant issue(s)/PR(s). ### Code of Conduct -Please note that the git2rdata project is released with a -[Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this -project you agree to abide by its terms. - -### See rOpenSci [contributing guide](https://ropensci.github.io/dev_guide/contributingguide.html) -for further details. - -### Discussion forum - -Check out our [discussion forum](https://discuss.ropensci.org) if you think your issue requires a longer form discussion. +Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). +By contributing to this project you agree to abide by its terms. ### Prefer to Email? Email the person listed as maintainer in the `DESCRIPTION` file of this repo. -Though note that private discussions over email don't help others - of course email is totally warranted if it's a sensitive problem of any kind. +Though note that private discussions over email don't help others - of course +email is totally warranted if it's a sensitive problem of any kind. ### Thanks for contributing! -This contributing guide is adapted from the tidyverse contributing guide available at https://raw.githubusercontent.com/r-lib/usethis/master/inst/templates/tidy-contributing.md +This contributing guide is adapted from the `tidyverse` contributing guide available at https://raw.githubusercontent.com/r-lib/usethis/master/inst/templates/tidy-contributing.md diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index e1beac8..090375b 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -13,7 +13,7 @@ or similar - or if just relates to an issue make sure to mention it like "#4" --> ## Example - +in Toulouse, France - + ## Installation diff --git a/_pkgdown.yml b/_pkgdown.yml index 5f96500..ac7dff8 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -1,33 +1,39 @@ +url: https://ropensci.github.io/git2rdata +template: + bootstrap: 5 + light-switch: false navbar: title: ~ type: default - left: - - text: NEWS - href: news/index.html - - text: Tutorials - href: articles/index.html - menu: - - text: Getting started storing dataframes as plain text - href: articles/plain_text.html - - text: Storing dataframes under version control - href: articles/version_control.html - - text: Potential workflow - href: articles/workflow.html - - text: Efficiency - href: articles/efficiency.html - - text: Large dataframes - href: articles/split_by.html - - text: Functions - href: reference/index.html - - text: Contributing - href: CONTRIBUTING.html - right: - - icon: "fa fa-github" - href: https://github.com/ropensci/git2rdata - - icon: "fa fa-twitter" - href: https://twitter.com/INBOVlaanderen - - icon: "fa fa-facebook" - href: https://www.facebook.com/pg/INBOVlaanderen + structure: + left: [intro, reference, news, tutorials, contributing] + right: [search, github] + components: + tutorials: + text: Tutorials + href: articles/index.html + menu: + - text: Getting started storing dataframes as plain text + href: articles/plain_text.html + - text: Storing dataframes under version control + href: articles/version_control.html + - text: Metadata + href: articles/metadata.html + - text: Potential workflow + href: articles/workflow.html + - text: Efficiency + href: articles/efficiency.html + - text: Large dataframes + href: articles/split_by.html + contributing: + text: Contributing + href: CONTRIBUTING.html + twitter: + icon: "fa fa-twitter" + href: https://twitter.com/INBOVlaanderen + facebook: + icon: "fa fa-facebook" + href: https://www.facebook.com/pg/INBOVlaanderen reference: - title: Storage @@ -43,6 +49,6 @@ reference: authors: Thierry Onkelinx: href: "https://www.muscardinus.be" - Research Institute for Nature and Forest: + Research Institute for Nature and Forest (INBO): href: "https://www.vlaanderen.be/inbo/en-gb" - html: "" + html: "logo of the Research Institute for Nature and Forest (INBO)" diff --git a/checklist.yml b/checklist.yml index ba0a85a..8873b7b 100644 --- a/checklist.yml +++ b/checklist.yml @@ -3,10 +3,22 @@ package: yes allowed: warnings: [] notes: [] -citation_roles: -- aut -- cre -keywords: -- R package -- reproducible research -- version control +required: +- CITATION +- DESCRIPTION +- R CMD check +- checklist +- codemeta +- documentation +- filename conventions +- folder conventions +- license +- lintr +- repository secret +- spelling +spelling: + default: en-GB + ignore: + - .github/ISSUE_TEMPLATE/feature_request.md + - LICENSE.md + - cran-comments.md diff --git a/codecov.yml b/codecov.yml index 2938b0e..6c208bb 100644 --- a/codecov.yml +++ b/codecov.yml @@ -1,17 +1,17 @@ comment: true coverage: - precision: 2 + precision: 1 round: down range: "70...100" status: - project: + patch: default: target: auto - threshold: 1% + threshold: 10% informational: true - patch: + project: default: target: auto threshold: 1% - informational: true + informational: false diff --git a/codemeta.json b/codemeta.json index e5d42f6..f611617 100644 --- a/codemeta.json +++ b/codemeta.json @@ -1,26 +1,20 @@ { - "@context": [ - "https://doi.org/10.5063/schema/codemeta-2.0", - "http://schema.org" - ], + "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "@type": "SoftwareSourceCode", "identifier": "git2rdata", - "description": "Make versioning of data.frame easy and efficient using git\n repositories.", + "description": "The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette(\"plain_text\", package = \"git2rdata\"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette(\"version_control\", package = \"git2rdata\"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette(\"workflow\", package = \"git2rdata\") gives a toy example. 4) vignette(\"efficiency\", package = \"git2rdata\") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.", "name": "git2rdata: Store and Retrieve Data.frames in a Git Repository", - "codeRepository": "https://github.com/ropensci/git2rdata", - "relatedLink": [ - "https://doi.org/10.5281/zenodo.1485309", - "https://CRAN.R-project.org/package=git2rdata" - ], + "relatedLink": ["https://ropensci.github.io/git2rdata/", "https://doi.org/10.5281/zenodo.1485309", "https://CRAN.R-project.org/package=git2rdata"], + "codeRepository": "https://github.com/ropensci/git2rdata/", "issueTracker": "https://github.com/ropensci/git2rdata/issues", "license": "https://spdx.org/licenses/GPL-3.0", - "version": "0.3.0", + "version": "0.4.1", "programmingLanguage": { "@type": "ComputerLanguage", "name": "R", "url": "https://r-project.org" }, - "runtimePlatform": "R version 4.0.2 (2020-06-22)", + "runtimePlatform": "R version 4.4.1 (2024-06-14)", "provider": { "@id": "https://cran.r-project.org", "@type": "Organization", @@ -62,14 +56,14 @@ "copyrightHolder": [ { "@type": "Organization", - "name": "Research Institute for Nature and Forest", + "name": "Research Institute for Nature and Forest (INBO)", "email": "info@inbo.be" } ], "funder": [ { "@type": "Organization", - "name": "Research Institute for Nature and Forest", + "name": "Research Institute for Nature and Forest (INBO)", "email": "info@inbo.be" } ], @@ -131,18 +125,6 @@ }, "sameAs": "https://CRAN.R-project.org/package=rmarkdown" }, - { - "@type": "SoftwareApplication", - "identifier": "spelling", - "name": "spelling", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Comprehensive R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - }, - "sameAs": "https://CRAN.R-project.org/package=spelling" - }, { "@type": "SoftwareApplication", "identifier": "testthat", @@ -156,14 +138,14 @@ "sameAs": "https://CRAN.R-project.org/package=testthat" } ], - "softwareRequirements": [ - { + "softwareRequirements": { + "1": { "@type": "SoftwareApplication", "identifier": "R", "name": "R", "version": ">= 3.5.0" }, - { + "2": { "@type": "SoftwareApplication", "identifier": "assertthat", "name": "assertthat", @@ -175,7 +157,7 @@ }, "sameAs": "https://CRAN.R-project.org/package=assertthat" }, - { + "3": { "@type": "SoftwareApplication", "identifier": "git2r", "name": "git2r", @@ -188,12 +170,12 @@ }, "sameAs": "https://CRAN.R-project.org/package=git2r" }, - { + "4": { "@type": "SoftwareApplication", "identifier": "methods", "name": "methods" }, - { + "5": { "@type": "SoftwareApplication", "identifier": "yaml", "name": "yaml", @@ -204,23 +186,36 @@ "url": "https://cran.r-project.org" }, "sameAs": "https://CRAN.R-project.org/package=yaml" + }, + "SystemRequirements": null + }, + "fileSize": "757.058KB", + "citation": [ + { + "@type": "SoftwareSourceCode", + "datePublished": "2024", + "author": { + "author": { + "@type": "Person", + "givenName": "Thierry", + "familyName": "Onkelinx" + } + }, + "name": "git2rdata: Store and Retrieve Data.frames in a Git Repository. Version 0.4.1", + "identifier": "10.5281/zenodo.1485309", + "url": "https://ropensci.github.io/git2rdata/", + "@id": "https://doi.org/10.5281/zenodo.1485309", + "sameAs": "https://doi.org/10.5281/zenodo.1485309" } ], - "fileSize": "762.31KB", "releaseNotes": "https://github.com/ropensci/git2rdata/blob/master/NEWS.md", - "readme": "https://github.com/ropensci/git2rdata/blob/master/README.md", - "contIntegration": "https://codecov.io/gh/ropensci/git2rdata", - "developmentStatus": ["https://www.repostatus.org/#active", "https://www.tidyverse.org/lifecycle/#maturing"], + "readme": "https://github.com/ropensci/git2rdata/blob/main/README.md", + "contIntegration": "https://app.codecov.io/gh/ropensci/git2rdata", + "developmentStatus": ["https://www.repostatus.org/#active", "https://lifecycle.r-lib.org/articles/stages.html#stable"], "review": { "@type": "Review", "url": "https://github.com/ropensci/software-review/issues/263", "provider": "https://ropensci.org" }, - "keywords": [ - "r", - "rstats", - "r-package", - "version-control", - "reproducible-research" - ] + "keywords": ["r", "rstats", "r-package", "version-control", "reproducible-research"] } diff --git a/cran-comments.md b/cran-comments.md index b0f7a17..6920180 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,28 +1,15 @@ ## Test environments * local - * ubuntu 20.04.4 LTS, R 4.1.3 + * ubuntu 22.04.4 LTS, R 4.4.1 * github actions * macOS-latest, release * windows-latest, release - * ubuntu 20.04, devel - * ubuntu 20.04, oldrel - * checklist package: ubuntu 20.04.4 LTS, R 4.1.3 -* r-hub - * debian: clang-devel, gcc-devel, gcc-patched, gcc-release - * fedora: clang-devel, gcc-devel - * macos: highsierra-release-cran - * windows_x86_64: devel, oldrel, release + * ubuntu 22.04, devel + * ubuntu 22.04, oldrel + * checklist 0.4.1 on ubuntu 22.04.4 LTS, R 4.4.1 + * https://inbo.r-universe.dev/checklist ## R CMD check results 0 errors | 0 warnings | 0 note - -r-hub gave a false positive note - -Windows Server 2022, R-devel, 64 bit - -checking for detritus in the temp directory ... NOTE -Found the following files/directories: - 'lastMiKTeXException' - diff --git a/inst/CITATION b/inst/CITATION index ba849f4..03d5c32 100644 --- a/inst/CITATION +++ b/inst/CITATION @@ -1,13 +1,14 @@ citHeader("To cite `git2rdata` in publications please use:") # begin checklist entry -citEntry( - entry = "Manual", - title = "git2rdata: Store and Retrieve Data.frames in a Git Repository. Version 0.4.0", - author = c(person(given = "Thierry", family = "Onkelinx")), - year = 2022, +bibentry( + bibtype = "Manual", + title = "git2rdata: Store and Retrieve Data.frames in a Git Repository. Version 0.4.1", + author = c( author = c(person(given = "Thierry", family = "Onkelinx"))), + year = 2024, url = "https://ropensci.github.io/git2rdata/", abstract = "The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette(\"plain_text\", package = \"git2rdata\"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette(\"version_control\", package = \"git2rdata\"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette(\"workflow\", package = \"git2rdata\") gives a toy example. 4) vignette(\"efficiency\", package = \"git2rdata\") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.", - textVersion = "Onkelinx, Thierry (2022) git2rdata: Store and Retrieve Data.frames in a Git Repository. Version 0.4.0. https://ropensci.github.io/git2rdata/, https://github.com/ropensci/git2rdata/", - keywords = "R package, reproducible research, version control", + textVersion = "Onkelinx, Thierry (2024) git2rdata: Store and Retrieve Data.frames in a Git Repository. Version 0.4.1. https://ropensci.github.io/git2rdata/", + keywords = "git; version control; plain text data", + doi = "10.5281/zenodo.1485309", ) # end checklist entry diff --git a/inst/en_gb.dic b/inst/en_gb.dic new file mode 100644 index 0000000..88339bd --- /dev/null +++ b/inst/en_gb.dic @@ -0,0 +1,8 @@ +Bitbucket +Gitlab +ROpenSci +codecov +kiB +rOpenSci +rdata +regex diff --git a/man/display_metadata.Rd b/man/display_metadata.Rd new file mode 100644 index 0000000..732ba4f --- /dev/null +++ b/man/display_metadata.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/display_metadata.R +\name{display_metadata} +\alias{display_metadata} +\title{Display metadata for a \code{git2rdata} object} +\usage{ +display_metadata(x, minimal = FALSE) +} +\arguments{ +\item{x}{a \code{git2rdata} object} + +\item{minimal}{logical, if \code{TRUE} only a message is displayed} +} +\description{ +Display metadata for a \code{git2rdata} object +} +\seealso{ +Other storage: +\code{\link{list_data}()}, +\code{\link{prune_meta}()}, +\code{\link{read_vc}()}, +\code{\link{relabel}()}, +\code{\link{rename_variable}()}, +\code{\link{rm_data}()}, +\code{\link{update_metadata}()}, +\code{\link{verify_vc}()}, +\code{\link{write_vc}()} +} +\concept{storage} diff --git a/man/git2rdata-package.Rd b/man/git2rdata-package.Rd index 93e7ab2..f92acc6 100644 --- a/man/git2rdata-package.Rd +++ b/man/git2rdata-package.Rd @@ -13,19 +13,20 @@ Useful links: \itemize{ \item \url{https://ropensci.github.io/git2rdata/} \item \url{https://github.com/ropensci/git2rdata/} + \item \doi{10.5281/zenodo.1485309} \item Report bugs at \url{https://github.com/ropensci/git2rdata/issues} } } \author{ -\strong{Maintainer}: Thierry Onkelinx \email{thierry.onkelinx@inbo.be} (\href{https://orcid.org/0000-0001-8804-4216}{ORCID}) +\strong{Maintainer}: Thierry Onkelinx \email{thierry.onkelinx@inbo.be} (\href{https://orcid.org/0000-0001-8804-4216}{ORCID}) (Research Institute for Nature and Forest (INBO)) Other contributors: \itemize{ - \item Floris Vanderhaeghe \email{floris.vanderhaeghe@inbo.be} (\href{https://orcid.org/0000-0002-6378-6229}{ORCID}) [contributor] - \item Peter Desmet \email{peter.desmet@inbo.be} (\href{https://orcid.org/0000-0002-8442-8025}{ORCID}) [contributor] - \item Els Lommelen \email{els.lommelen@inbo.be} (\href{https://orcid.org/0000-0002-3481-5684}{ORCID}) [contributor] - \item Research Institute for Nature and Forest \email{info@inbo.be} [copyright holder, funder] + \item Floris Vanderhaeghe \email{floris.vanderhaeghe@inbo.be} (\href{https://orcid.org/0000-0002-6378-6229}{ORCID}) (Research Institute for Nature and Forest (INBO)) [contributor] + \item Peter Desmet \email{peter.desmet@inbo.be} (\href{https://orcid.org/0000-0002-8442-8025}{ORCID}) (Research Institute for Nature and Forest (INBO)) [contributor] + \item Els Lommelen \email{els.lommelen@inbo.be} (\href{https://orcid.org/0000-0002-3481-5684}{ORCID}) (Research Institute for Nature and Forest (INBO)) [contributor] + \item Research Institute for Nature and Forest (INBO) \email{info@inbo.be} [copyright holder, funder] } } diff --git a/man/is_git2rdata.Rd b/man/is_git2rdata.Rd index d0c18c3..ff111a7 100644 --- a/man/is_git2rdata.Rd +++ b/man/is_git2rdata.Rd @@ -55,6 +55,8 @@ is_git2rdata("iris", root) Other internal: \code{\link{is_git2rmeta}()}, \code{\link{meta}()}, +\code{\link{print.git2rdata}()}, +\code{\link{summary.git2rdata}()}, \code{\link{upgrade_data}()} } \concept{internal} diff --git a/man/is_git2rmeta.Rd b/man/is_git2rmeta.Rd index dfdd7ea..f4143e3 100644 --- a/man/is_git2rmeta.Rd +++ b/man/is_git2rmeta.Rd @@ -58,6 +58,8 @@ is_git2rdata("iris", root) Other internal: \code{\link{is_git2rdata}()}, \code{\link{meta}()}, +\code{\link{print.git2rdata}()}, +\code{\link{summary.git2rdata}()}, \code{\link{upgrade_data}()} } \concept{internal} diff --git a/man/list_data.Rd b/man/list_data.Rd index a84b9d7..22dd8ed 100644 --- a/man/list_data.Rd +++ b/man/list_data.Rd @@ -92,11 +92,13 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{prune_meta}()}, \code{\link{read_vc}()}, \code{\link{relabel}()}, \code{\link{rename_variable}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()}, \code{\link{write_vc}()} } diff --git a/man/meta.Rd b/man/meta.Rd index db79721..58bf948 100644 --- a/man/meta.Rd +++ b/man/meta.Rd @@ -79,7 +79,7 @@ In case of a data.frame, \code{meta()} applies itself to each of the columns. Th plus an additional \code{..generic} element. \code{..generic} is a reserved name for the metadata and not allowed as column name in a \code{data.frame}. -\code{\link{write_vc}} uses this function to prepare a dataframe for storage. +\code{write_vc()} uses this function to prepare a dataframe for storage. Existing metadata is passed through the optional \code{old} argument. This argument intended for internal use. } @@ -113,6 +113,8 @@ meta(as.Date("2019-02-01"), optimize = FALSE) Other internal: \code{\link{is_git2rdata}()}, \code{\link{is_git2rmeta}()}, +\code{\link{print.git2rdata}()}, +\code{\link{summary.git2rdata}()}, \code{\link{upgrade_data}()} } \concept{internal} diff --git a/man/print.git2rdata.Rd b/man/print.git2rdata.Rd new file mode 100644 index 0000000..4436c28 --- /dev/null +++ b/man/print.git2rdata.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/print.R +\name{print.git2rdata} +\alias{print.git2rdata} +\title{Print method for \code{git2rdata} objects.} +\usage{ +\method{print}{git2rdata}(x, ...) +} +\arguments{ +\item{x}{a \code{git2rdata} object} + +\item{...}{additional arguments passed to \code{print}} +} +\description{ +Prints the data and the description of the columns when available. +} +\seealso{ +Other internal: +\code{\link{is_git2rdata}()}, +\code{\link{is_git2rmeta}()}, +\code{\link{meta}()}, +\code{\link{summary.git2rdata}()}, +\code{\link{upgrade_data}()} +} +\concept{internal} diff --git a/man/prune_meta.Rd b/man/prune_meta.Rd index 2026ec8..754f961 100644 --- a/man/prune_meta.Rd +++ b/man/prune_meta.Rd @@ -106,11 +106,13 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{read_vc}()}, \code{\link{relabel}()}, \code{\link{rename_variable}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()}, \code{\link{write_vc}()} } diff --git a/man/read_vc.Rd b/man/read_vc.Rd index 03452bd..909bf7c 100644 --- a/man/read_vc.Rd +++ b/man/read_vc.Rd @@ -17,6 +17,8 @@ Defaults to the current working directory (\code{"."}).} } \value{ The \code{data.frame} with the file names and hashes as attributes. +It has the additional class \code{"git2rdata"} to support extra methods to +display the descriptions. } \description{ \code{read_vc()} handles git2rdata objects stored by \code{write_vc()}. It reads and @@ -89,11 +91,13 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{prune_meta}()}, \code{\link{relabel}()}, \code{\link{rename_variable}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()}, \code{\link{write_vc}()} } diff --git a/man/relabel.Rd b/man/relabel.Rd index f37e063..9d6486f 100644 --- a/man/relabel.Rd +++ b/man/relabel.Rd @@ -82,11 +82,13 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{prune_meta}()}, \code{\link{read_vc}()}, \code{\link{rename_variable}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()}, \code{\link{write_vc}()} } diff --git a/man/rename_variable.Rd b/man/rename_variable.Rd index 0c78e42..3fd8f25 100644 --- a/man/rename_variable.Rd +++ b/man/rename_variable.Rd @@ -80,11 +80,13 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{prune_meta}()}, \code{\link{read_vc}()}, \code{\link{relabel}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()}, \code{\link{write_vc}()} } diff --git a/man/rm_data.Rd b/man/rm_data.Rd index 863db48..21b4d77 100644 --- a/man/rm_data.Rd +++ b/man/rm_data.Rd @@ -122,11 +122,13 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{prune_meta}()}, \code{\link{read_vc}()}, \code{\link{relabel}()}, \code{\link{rename_variable}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()}, \code{\link{write_vc}()} } diff --git a/man/summary.git2rdata.Rd b/man/summary.git2rdata.Rd new file mode 100644 index 0000000..e135857 --- /dev/null +++ b/man/summary.git2rdata.Rd @@ -0,0 +1,26 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/print.R +\name{summary.git2rdata} +\alias{summary.git2rdata} +\title{Summary method for \code{git2rdata} objects.} +\usage{ +\method{summary}{git2rdata}(object, ...) +} +\arguments{ +\item{object}{a \code{git2rdata} object} + +\item{...}{additional arguments passed to \code{summary}} +} +\description{ +Prints the summary of the data and the description of the columns when +available. +} +\seealso{ +Other internal: +\code{\link{is_git2rdata}()}, +\code{\link{is_git2rmeta}()}, +\code{\link{meta}()}, +\code{\link{print.git2rdata}()}, +\code{\link{upgrade_data}()} +} +\concept{internal} diff --git a/man/update_metadata.Rd b/man/update_metadata.Rd new file mode 100644 index 0000000..968e138 --- /dev/null +++ b/man/update_metadata.Rd @@ -0,0 +1,47 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/update_metadata.R +\name{update_metadata} +\alias{update_metadata} +\title{Update the description of a \code{git2rdata} object} +\usage{ +update_metadata(file, root = ".", field_description, name, title, description) +} +\arguments{ +\item{file}{the name of the git2rdata object. Git2rdata objects cannot +have dots in their name. The name may include a relative path. \code{file} is a +path relative to the \code{root}. +Note that \code{file} must point to a location within \code{root}.} + +\item{root}{The root of a project. Can be a file path or a \code{git-repository}. +Defaults to the current working directory (\code{"."}).} + +\item{field_description}{a named character vector with the new descriptions +for the fields. +The names of the vector must match the variable names.} + +\item{name}{a character string with the new table name of the object.} + +\item{title}{a character string with the new title of the object.} + +\item{description}{a character string with the new description of the object.} +} +\description{ +Allows to update the description of the fields, the table name, the title, +and the description of a \code{git2rdata} object. +All arguments are optional. +Setting an argument to \code{NA} or an empty string will remove the corresponding +field from the metadata. +} +\seealso{ +Other storage: +\code{\link{display_metadata}()}, +\code{\link{list_data}()}, +\code{\link{prune_meta}()}, +\code{\link{read_vc}()}, +\code{\link{relabel}()}, +\code{\link{rename_variable}()}, +\code{\link{rm_data}()}, +\code{\link{verify_vc}()}, +\code{\link{write_vc}()} +} +\concept{storage} diff --git a/man/upgrade_data.Rd b/man/upgrade_data.Rd index 8f90cb7..b103ead 100644 --- a/man/upgrade_data.Rd +++ b/man/upgrade_data.Rd @@ -66,6 +66,8 @@ upgrade_data(path = ".", root = root) Other internal: \code{\link{is_git2rdata}()}, \code{\link{is_git2rmeta}()}, -\code{\link{meta}()} +\code{\link{meta}()}, +\code{\link{print.git2rdata}()}, +\code{\link{summary.git2rdata}()} } \concept{internal} diff --git a/man/verify_vc.Rd b/man/verify_vc.Rd index 022af43..0867f7d 100644 --- a/man/verify_vc.Rd +++ b/man/verify_vc.Rd @@ -24,12 +24,14 @@ data.frame. } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{prune_meta}()}, \code{\link{read_vc}()}, \code{\link{relabel}()}, \code{\link{rename_variable}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{write_vc}()} } \concept{storage} diff --git a/man/write_vc.Rd b/man/write_vc.Rd index 095a738..8a4e7d9 100644 --- a/man/write_vc.Rd +++ b/man/write_vc.Rd @@ -160,12 +160,14 @@ status(repo) } \seealso{ Other storage: +\code{\link{display_metadata}()}, \code{\link{list_data}()}, \code{\link{prune_meta}()}, \code{\link{read_vc}()}, \code{\link{relabel}()}, \code{\link{rename_variable}()}, \code{\link{rm_data}()}, +\code{\link{update_metadata}()}, \code{\link{verify_vc}()} } \concept{storage} diff --git a/pkgdown/extra.css b/pkgdown/extra.css index 00938dd..dfdfc0b 100644 --- a/pkgdown/extra.css +++ b/pkgdown/extra.css @@ -18,13 +18,20 @@ a:hover { .navbar, .label-default, -.navbar-default .navbar-nav>.active>a, .navbar-default .navbar-nav>.active>a:hover, .navbar-default .navbar-nav>.active>a:focus { - background-color: #356196; +.navbar-default .navbar-nav>.active>a, .navbar-default .navbar-nav>.active>a:hover, .navbar-default .navbar-nav>.active>a:focus, +#toc>.nav a.nav-link.active { + background-color: #356196 !important; } -.navbar-default .navbar-link, -.navbar-default .navbar-nav>li>a { - color: #ffffff; +#toc>.nav a.nav-link { + background-color: #729BB7 !important; +} + +.navbar, +.navbar-brand, +.nav-link, +.nav-text.text-muted { + color: #ffffff !important; } .nav-pills li.active>a, .nav-pills li>a:hover { diff --git a/tests/testthat/test_d_description.R b/tests/testthat/test_d_description.R new file mode 100644 index 0000000..d6a3140 --- /dev/null +++ b/tests/testthat/test_d_description.R @@ -0,0 +1,71 @@ +test_that("description", { + expect_error( + update_metadata( + file = "test", root = data.frame() + ), + "a 'root' of class data.frame is not supported" + ) + + root <- tempfile(pattern = "git2rdata-description") + dir.create(root) + + expect_is( + write_vc( + x = test_data, file = "test.txt", root = root, sorting = "test_Date" + ), + "character" + ) + + expect_null( + update_metadata( + file = "test", root = root, field_description = c( + test_character = "Some information", test_factor = "Some information", + test_integer = "Some information" + ) + ) + ) + + expect_is({ + output <- read_vc("test", root = root) + }, "git2rdata" + ) + expect_true(assertthat::has_attr(output$test_character, "description")) + expect_true(assertthat::has_attr(output$test_factor, "description")) + expect_true(assertthat::has_attr(output$test_integer, "description")) + expect_false(assertthat::has_attr(output$test_ordered, "description")) + expect_false(assertthat::has_attr(output, "table name")) + expect_false(assertthat::has_attr(output, "title")) + expect_false(assertthat::has_attr(output, "description")) + expect_output(print(output), "display_metadata") + expect_output(summary(output), "display_metadata") + expect_output(display_metadata(output, minimal = TRUE), "display_metadata") + expect_output(display_metadata(output, minimal = FALSE), "Table name: NA") + expect_output(display_metadata(output), "Table name: NA") + + root <- git2r::init(root) + git2r::config(root, user.name = "Alice", user.email = "alice@example.org") + writeLines("ignore.*\nforce.*", file.path(git2r::workdir(root), ".gitignore")) + git2r::add(root, ".gitignore") + commit(root, "initial commit") + + expect_null( + update_metadata( + file = "test", root = root, name = "my_table", title = "My Table", + description = "This is description for the unit tests", + field_description = c(test_character = NA, test_factor = "") + ) + ) + expect_is({ + output <- read_vc("test", root = root) + }, "git2rdata" + ) + expect_false(assertthat::has_attr(output$test_character, "description")) + expect_false(assertthat::has_attr(output$test_factor, "description")) + expect_true(assertthat::has_attr(output$test_integer, "description")) + expect_true(assertthat::has_attr(output, "table name")) + expect_true(assertthat::has_attr(output, "title")) + expect_true(assertthat::has_attr(output, "description")) + expect_output(print(output), "display_metadata") + expect_output(summary(output), "display_metadata") + expect_output(display_metadata(output), "Table name: my_table") +}) diff --git a/vignettes/efficiency.Rmd b/vignettes/efficiency.Rmd index 50a4cef..4c757fb 100644 --- a/vignettes/efficiency.Rmd +++ b/vignettes/efficiency.Rmd @@ -142,7 +142,7 @@ This vignette compares storage and retrieval of data by `git2rdata` with other s We consider `write.table()` and `read.table()` for data stored in a plain text format. `saveRDS()` and `readRDS()` use a compressed binary format. -To get some meaningful results, we will use the `nassCDS` dataset from the [DAAG](https://www.rdocumentation.org/packages/DAAG/versions/1.22/topics/nassCDS) package. +To get some meaningful results, we will use the `nassCDS` dataset from the [DAAG](https://www.rdocumentation.org/packages/DAAG/versions/1.22/topics/nassCDS) package. We'll avoid the dependency on the package by directly downloading the data. ```{r download_data, eval = system.file("efficiency", "airbag.rds", package = "git2rdata") == ""} diff --git a/vignettes/metadata.Rmd b/vignettes/metadata.Rmd new file mode 100644 index 0000000..6e3239b --- /dev/null +++ b/vignettes/metadata.Rmd @@ -0,0 +1,105 @@ +--- +title: "Adding metadata" +author: "Thierry Onkelinx" +output: + rmarkdown::html_vignette: + fig_caption: yes +vignette: > + %\VignetteIndexEntry{Adding metadata} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +## Introduction + +`git2rdata` supports extra metadata since version 0.4.1. +Metadata is stored in a separate file with the same name as the data file, but with the extension `.yml`. +The metadata file is a YAML file with a specific structure. +The metadata file contains a generic section and a section for each field in the data file. +The generic section contains information about the data file as a whole. +The fields sections contain information about the fields in the data file. +The metadata file is stored in the same directory as the data file. + +The generic section contains the following mandatory properties, automatically created by `git2rdata`: + +- `git2rdata`: the version of `git2rdata` used to create the metadata. +- `datahash`: the hash of the data file. +- `hash`: the hash of the metadata file. +- `optimize`: a logical indicating whether the data file is optimized for `git2rdata`. +- `sorting`: a character vector with the names of the fields in the data file. +- `split_by`: a character vector with the names of the fields used to split the data file. +- `NA string`: the string used to represent missing values in the data file. + +The generic section can contain the following optional properties: + +- `table name`: the name of the dataset. +- `title`: the title of the dataset. +- `description`: a description of the dataset. + +The fields sections contain the following mandatory properties, automatically created by `git2rdata`: + +- `type`: the type of the field. +- `class`: the class of the field. +- `levels`: the levels of the field (for factors). +- `index`: the index of the field (for factors). +- `NA string`: the string used to represent missing values in the field. + +The fields sections can contain the following optional properties: + +- `description`: a description of the field. + +## Adding metadata + +`write_vc()` only stores the mandatory properties in the metadata file. + +```{r store-metadata} +library(git2rdata) +root <- tempfile("git2rdata-metadata") +dir.create(root) +write_vc(iris, file = "iris", root = root, sorting = "Sepal.Length") +``` + +## Reading metadata + +`read_vc()` reads the metadata file and adds it as attributes to the `data.frame`. +`print()` and `summary()` alert the user to the `display_metadata()` function. +This function displays the metadata of a `git2rdata` object. +Missing optional metadata results in an `NA` value in the output of `display_metadata()`. + +```{r read-metadata} +my_iris <- read_vc("iris", root = root) +str(my_iris) +print(head(my_iris)) +summary(my_iris) +display_metadata(my_iris) +``` + +## Updating the optional metadata + +To add metadata to a `git2rdata` object, use the `update_metadata()` function. +This function allows you to add or update the optional metadata of a `git2rdata` object. +Setting an argument to `NA` or an empty string will remove the corresponding property from the metadata. +The function only updates the metadata file, not the data file. +To see the changes, read the object again before using `display_metadata()`. +Note that all the metadata is available in the `data.frame` as attributes. + +```{r update-metadata} +update_metadata( + file = "iris", root = root, name = "iris", title = "Iris dataset", + description = +"The Iris dataset is a multivariate dataset introduced by the British +statistician and biologist Ronald Fisher in his 1936 paper The use of multiple +measurements in taxonomic problems.", + field_description = c( + Sepal.Length = "The length of the sepal in cm", + Sepal.Width = "The width of the sepal in cm", + Petal.Length = "The length of the petal in cm", + Petal.Width = "The width of the petal in cm", + Species = "The species of the iris" + ) +) +my_iris <- read_vc("iris", root = root) +display_metadata(my_iris) +str(my_iris) +``` + diff --git a/vignettes/plain_text.Rmd b/vignettes/plain_text.Rmd index 379efea..c7bc349 100644 --- a/vignettes/plain_text.Rmd +++ b/vignettes/plain_text.Rmd @@ -36,7 +36,10 @@ These functions determine factor levels based on the observed levels in the plai Hence factor levels without observations will disappear. The order of the factor levels is also determined by the available levels in the plain text file, which can be different from the original order. -The `write_vc()` and `read_vc()` functions from `git2rdata` keep track of the class of each variable and, in case of a factor, also of the factor levels and their order. Hence this function pair preserves the information content of the dataframe. The `vc` suffix stands for **v**ersion **c**ontrol as these functions use their full capacity in combination with a version control system. +The `write_vc()` and `read_vc()` functions from `git2rdata` keep track of the class of each variable and, in case of a factor, also of the factor levels and their order. +Hence this function pair preserves the information content of the dataframe. The `vc` suffix stands for +**v**ersion **c**ontrol +as these functions use their full capacity in combination with a version control system. ## Efficiency Relative to Storage and Time @@ -61,7 +64,9 @@ Store and return timestamps as UTC. - Store a `Date` as an integer to the data. Store the class and the origin in the metadata. -Storing the factors, POSIXct and Date as their index, makes them less user readable. The user can turn off this optimization when user readability is more important than file size. +Storing the factors, +POSIXct +and Date as their index, makes them less user readable. The user can turn off this optimization when user readability is more important than file size. ### Optimized for Version Control @@ -135,11 +140,13 @@ print_file("first_test.yml", path) Adding `optimize = FALSE` to `write_vc()` will keep the raw data in a human readable format. The metadata file is slightly different. The most obvious is the `optimize: no` tag and the different hash. -Another difference is the metadata for POSIXct and Date classes. +Another difference is the metadata for +POSIXct +and Date classes. They will no longer have an origin tag but a format tag. Another important difference is that we store the data file as comma separated values instead of tab separated values. -We noticed that the csv file format is more easily recognised by a larger audience as a data file. +We noticed that the `csv` file format is more easily recognised by a larger audience as a data file. ```{r write_verbose} diff --git a/vignettes/version_control.Rmd b/vignettes/version_control.Rmd index 1515546..0dbf7f7 100644 --- a/vignettes/version_control.Rmd +++ b/vignettes/version_control.Rmd @@ -62,7 +62,9 @@ This implies that two observations switching place does not alter the informatio Nor does switching two variables. Version control systems like [git](https://git-scm.com/), [subversion](https://subversion.apache.org/) or [mercurial](https://www.mercurial-scm.org/) focus on accurately keeping track of _any_ change in the files. -Two observations switching place in a plain text file _is_ a change, although the information content^[_sensu_ `git2rdata`] doesn't change. +Two observations switching place in a plain text file _is_ a change, although the information content^[ +_sensu_ +`git2rdata`] doesn't change. `git2rdata` helps the user to prepare the plain text files in such a way that any change in the version history is an actual change in the information content. ## Sorting Observations