diff --git a/docs/paper.html b/docs/paper.html index d6d10f7..b531d08 100644 --- a/docs/paper.html +++ b/docs/paper.html @@ -52,11 +52,13 @@

Summary

-

NPSdataverse is a suite of R packages modeled off of the tidyverse concept of several packages built with a common goal [@Wickham2019]. The overarching theme of the NPSdataverse packages is creating, publishing, and accessing Open, machine-readable data and metadata. NPSdataverse supports Ecological Metadata Language (EML) metadata and .csv data files. Some of the constituent packages (R/EML and R/EMLassemblyline) are general-use packages aimed at authoring EML documents. Additional packages (R/QCkit, R/EMLeditor, R/DPchecker and R/NPSutils) are designed and maintained by the National Park Service. Although many functions within the NPSdataverse packages are NPS-specific (particularly API calls), or have default parameters with NPS staff in mind, all of the functions are written so that they can be used by the general public. Anyone interested applying for research permits or conducting research on National Park Units can reference and utilize the NPSdataverse packages. Additionally, the packages will be useful for data management plans in wide variety of grant proposals and for anyone that needs to create Open data and machine readable metadata to comply with the Open Data Act of 2018. Finally, the ability to author, edit, and check EML metadata will be useful for data publication at any number of repositories or data journals.

+

NPSdataverse is a suite of R packages modeled off of the tidyverse concept of several packages built with a common goal [@Wickham2019]. The overarching theme of the NPSdataverse packages is creating, publishing, and accessing Open, machine-readable data and metadata. NPSdataverse supports Ecological Metadata Language (EML) metadata and .csv data files. Some of the constituent packages (R/EML and R/EMLassemblyline) are general-use packages aimed at authoring EML documents. Additional packages (R/QCkit, R/EMLeditor, R/DPchecker and R/NPSutils) are designed and maintained by the National Park Service. Although many functions within the NPSdataverse packages are NPS-specific (particularly API calls), or have default parameters with NPS staff in mind, all of the functions are written so that they can also be used by the general public. Anyone interested applying for research permits or conducting research on National Park Units can reference and utilize the NPSdataverse packages. Additionally, the packages will be useful for data management plans in wide variety of grant proposals and for anyone that needs to create Open data and machine readable metadata to comply with the Open Data Act of 2018. Finally, the ability to author, edit, and check EML metadata will be useful for data publication at any number of repositories or data journals.

Statement of need

-

Some text with maybe some background, history, citations, etc pointing out the need for the software

+

Following a long-term movement for transparency and data accessibility, the U.S. implimented an Open Data Memorandum in 2013 (OMB M-13-13) and the federal Open Data Act of 2019 [@OpenData2019]. the Open Data Act mandated that federal agencies provide data in open formats with metadata. Subsequently, many funding agencies such as the National Science Foundation have required grant awardees to make data public, often includingmetadata ([@nsf2015]). Several academic publishers have followed suit. Multiple publishers have followed suit ([@Wiley2022], [@springer2023])), requiring data availability statements upon publication.

+

One goal of open science, and requirement of the Open Government Data Act is to include metadata along with data. Ecological Metadata Language Metadata (EML) is one metadata standard that is particularly amenable to studies with rich taxonomy. It has been adopted by multiple research organizations including the Ecological Data Initiative (EDI), the National Ecological Observatory Network (NEON), the Global Biodiversity Information Facility (GBIF), Swedish Biodiversity Data Infrastructure (SBDI), the French Biodiversity Hub (“Pole National de Donnees de Biodiversite”), the U.S. National Park Service, and others.

+

Nevertheless, actual availability of data varies ([@Federer2018, @Tedersoo2021], perhaps because there is a need for more infrastructure and tools to meet the goals of open data and open science ([@Huston2019]). Multiple solutions have been presented, including ezEML, a workflow for authoring metadata in Ecological Metadata Language and publishing data and metadata to a repository ([@Vanderbilt2022]). However, ezEML is has an intuitive graphical user interface with a relatively low learning curve, it does have some drawbacks. For instance, ezEML is not scriptable, which makes repeated deployments of the same or similar workflows challenging. And, ezEML requires the user upload their data to an external site for processing, which may not be suitable for sensitive data. Here we introduce the NPSdataverse, a series of R-based packages for authoring, editing, and checking EML metadata locally in a scriptable fashion. Packages within the NPSdataverse also include data munging and data access/download functions.

NPSdataverse package

diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index b265579..6b7bb47 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -3,4 +3,4 @@ pkgdown: 2.1.0 pkgdown_sha: ~ articles: NPSdataverse: NPSdataverse.html -last_built: 2024-08-27T21:12Z +last_built: 2024-09-03T23:57Z diff --git a/paper.bib b/paper.bib index ee73dca..8bbc93a 100644 --- a/paper.bib +++ b/paper.bib @@ -1 +1,77 @@ @article{Wickham2019, doi = {10.21105/joss.01686}, url = {https://doi.org/10.21105/joss.01686}, year = {2019}, publisher = {The Open Journal}, volume = {4}, number = {43}, pages = {1686}, author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani}, title = {Welcome to the Tidyverse}, journal = {Journal of Open Source Software} } + +@misc{OpenData2019, + url = {https://www.congress.gov/bill/115th-congress/house-bill/4174}, + howpublished = {H.R.4174 - 115th Congress}, + journal = {law}, + title = {"H.R. 4171 - OPEN Government Data Act"}, + year = {2019}} + +@manual{nsf2015, + url = {https://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf15094}, + title = {The National Science Foundation Open Government Plant 3.5}, + howpublished = {NSF Document Number nsf15093}, + organization = {National Science Foundation}, + address = {Alexandria, VA, USA}, + year = {2015} + } + +@misc{Wiley2022, + url = {https://authorservices.wiley.com/author-resources/Journal-Authors/open-access/data-sharing-citation/data-sharing-policy.html}, + title = {Wiley's data sharing policies}, + year = {2022} + } + +@misc{Springer2023, + url = {https://www.springer.com/gp/editorial-policies/data-availability-statement?srsltid=AfmBOoq9OGxFR-H9UXUfYx_Nl1fRgfnBfCIFl3nbUqkNcRey1oaTBNqn}, + title = {Data Availability Statement}, + year = {} + } + +@article{Federer2018, + doi = {10.1371/journal.pone.0194768}, + author = {Federer, Lisa M. AND Belter, Christopher W. AND Joubert, Douglas J. AND Livinski, Alicia AND Lu, Ya-Ling AND Snyders, Lissa N. AND Thompson, Holly}, + journal = {PLOS ONE}, + publisher = {Public Library of Science}, + title = {Data sharing in PLOS ONE: An analysis of Data Availability Statements}, + year = {2018}, + month = {05}, + volume = {13}, + url = {https://doi.org/10.1371/journal.pone.0194768}, + pages = {1-12}, + abstract = {A number of publishers and funders, including PLOS, have recently adopted policies requiring researchers to share the data underlying their results and publications. Such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis. In this study, we evaluate the extent to which authors have complied with this policy by analyzing Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016. Our analysis shows that compliance with the policy has increased, with a significant decline over time in papers that did not include a Data Availability Statement. However, only about 20% of statements indicate that data are deposited in a repository, which the PLOS policy states is the preferred method. More commonly, authors state that their data are in the paper itself or in the supplemental information, though it is unclear whether these data meet the level of sharing required in the PLOS policy. These findings suggest that additional review of Data Availability Statements or more stringent policies may be needed to increase data sharing.}, + number = {5}, +} + +@article{Tedersoo2021, + title={Data sharing practices and data availability upon request differ across scientific disciplines}, + author={Tedersoo, Leho and K{\"u}ngas, Rainer and Oras, Ester and K{\"o}ster, Kajar and Eenmaa, Helen and Leijen, {\"A}li and Pedaste, Margus and Raju, Marju and Astapova, Anastasiya and Lukner, Heli and others}, + journal={Scientific data}, + volume={8}, + number={1}, + pages={192}, + year={2021}, + publisher={Nature Publishing Group UK London} +} + +@article{Huston2019, + title={Open science/open data: Reaping the benefits of open data in public health}, + author={Huston, P and Edge, VL and Bernier, E}, + journal={Canada Communicable Disease Report}, + volume={45}, + number={11}, + pages={252}, + year={2019}, + publisher={Public Health Agency of Canada} +} + +@article{Vanderbilt2022, + title={Publishing ecological data in a repository: An easy workflow for everyone}, + author={Vanderbilt, Kristin and Ide, Jon and Gries, Corinna and Grossman-Clarke, Susanne and Hanson, Paul and O'Brien, Margaret and Servilla, Mark and Smith, Colin and Waide, Robert and Zollo-Venecek, Kyle}, + journal={The Bulletin of the Ecological Society of America}, + volume={103}, + number={4}, + pages={e2018}, + year={2022}, + publisher={Wiley Online Library} +} diff --git a/paper.md b/paper.md index e5626bf..71dd388 100644 --- a/paper.md +++ b/paper.md @@ -77,7 +77,7 @@ packages (R/QCkit, R/EMLeditor, R/DPchecker and R/NPSutils) are designed and maintained by the National Park Service. Although many functions within the NPSdataverse packages are NPS-specific (particularly API calls), or have default parameters with NPS staff in mind, all of the -functions are written so that they can be used by the general public. +functions are written so that they can also be used by the general public. Anyone interested applying for research permits or conducting research on National Park Units can reference and utilize the NPSdataverse packages. Additionally, the packages will be useful for data management @@ -89,8 +89,12 @@ repositories or data journals. # Statement of need -Some text with maybe some background, history, citations, etc pointing -out the need for the software +Following a long-term movement for transparency and data accessibility, the U.S. implimented an Open Data Memorandum in 2013 (OMB M-13-13) and the federal Open Data Act of 2019 [@OpenData2019]. the Open Data Act mandated that federal agencies provide data in open formats with metadata. Subsequently, many funding agencies such as the National Science Foundation have required grant awardees to make data public, often includingmetadata ([@nsf2015]). Several academic publishers have followed suit. Multiple publishers have followed suit ([@Wiley2022], [@springer2023])), requiring data availability statements upon publication. + +One goal of open science, and requirement of the Open Government Data Act is to include metadata along with data. Ecological Metadata Language Metadata (EML) is one metadata standard that is particularly amenable to studies with rich taxonomy. It has been adopted by multiple research organizations including the Ecological Data Initiative (EDI), the National Ecological Observatory Network (NEON), the Global Biodiversity Information Facility (GBIF), Swedish Biodiversity Data Infrastructure (SBDI), the French Biodiversity Hub ("Pole National de Donnees de Biodiversite"), the U.S. National Park Service, and others. + +Nevertheless, actual availability of data varies ([@Federer2018, @Tedersoo2021], perhaps because there is a need for more infrastructure and tools to meet the goals of open data and open science ([@Huston2019]). Multiple solutions have been presented, including ezEML, a workflow for authoring metadata in Ecological Metadata Language and publishing data and metadata to a repository ([@Vanderbilt2022]). However, ezEML is has an intuitive graphical user interface with a relatively low learning curve, it does have some drawbacks. For instance, ezEML is not scriptable, which makes repeated deployments of the same or similar workflows challenging. And, ezEML requires the user upload their data to an external site for processing, which may not be suitable for sensitive data. Here we introduce the NPSdataverse, a series of R-based packages for authoring, editing, and checking EML metadata locally in a scriptable fashion. Packages within the NPSdataverse also include data munging and data access/download functions. + # NPSdataverse package