Skip to content

Commit

Permalink
Merge pull request #43 from aim-rsf/description
Browse files Browse the repository at this point in the history
Change installation instruction of the pkg
  • Loading branch information
RayStick authored Jan 5, 2024
2 parents 2aa6c6d + e5e9bb5 commit 42dcd30
Show file tree
Hide file tree
Showing 7 changed files with 136 additions and 60 deletions.
19 changes: 14 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,17 +1,26 @@
Package: browseMetadata
Type: Package
Title: Maps domains to varaibles within a dataset (SAIL databank)
Title: Browses available metadata, to catergorise or label each variable in a dataset
Version: 0.1.0
Authors@R:
person("Rachael", "Stickland", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-3398-4272"))
Maintainer: Rachael Stickland <[email protected]>
Description: This function will read in the meta-data of a dataset (DataClass) from the SAIL databank, obtained from https://modelcatalogue.cs.ox.ac.uk/hdruk_live/.
It will loop through all the variable names, and ask you to categorise each variable into one of your chosen domains.
The domains (or 'latent concepts') will appear up in the Plots tab for your reference, as well as information about the dataset.
A log file will be saved with the categorizations you made.
Description: This package currently contains one function, domain_mapping.
This function takes two inputs. One input is a metadata file that has been downloaded in json format from modelcatalogue.cs.ox.ac.uk/hdruk_live.
The second input is a csv file created by the user, that lists research domains of interest.
The function will read in the metadata file for a chosen dataset, loop through all the variables, and ask the user to catergorise/label each variable as belonging to one or more domains.
The domains will appear in the Plots tab and information about the dataset will be printed to the R console, for the user's reference in making these categorisations.
A log file will be saved with the catergorisations made. To speed up this process, some auto-categorisations will be made by the function for commonly occurring variables;
these should be verified by the user by checking the csv log file. Example inputs are provided within the package data, for the user to run this function in a demo mode.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
Depends:
R (>= 2.10)
Imports:
cli,
devtools,
grid,
gridExtra,
rjson
8 changes: 8 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# Generated by roxygen2: do not edit by hand

export(domain_mapping)
import(cli)
import(devtools)
import(grid)
import(gridExtra)
import(rjson)
importFrom(graphics,plot.new)
importFrom(utils,read.csv)
importFrom(utils,write.csv)
11 changes: 11 additions & 0 deletions R/browseMetadata-package.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#' browseMetadata
#'
#' Browses available metadata, to catergorise/label each variable in a dataset.
#'
#' @import devtools grid gridExtra rjson cli
#' @importFrom graphics plot.new
#' @importFrom utils read.csv write.csv
#' @keywords internal
"_PACKAGE"

NULL
96 changes: 57 additions & 39 deletions R/domain_mapping.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
#' domain_mapping
#'
#'This function will read in the meta-data of a Data Asset obtained from metadata catalogue: https://modelcatalogue.cs.ox.ac.uk/hdruk_live.
#'It will loop through all the Data Elements (variable names) in each Data Class (table), and ask you to categorise each variable into one of your chosen domains.
#'Information about the Data Asset and Data Class can be displayed to the command window for reference.
#'The domains will appear in the Plots tab, with the labels you should use for the categorisation.
#'This function will read in the metadata file for a chosen dataset, loop through all the variables, and ask the user to catergorise/label each variable as belonging to one or more domains.
#'The domains will appear in the Plots tab and dataset information will be printed to the R console, for the user's reference in making these categorisations.
#'A log file will be saved with the catergorisations made.
#'To speed up this process, some auto-categorisations will be made by the function for commonly occurring variables;
#'these auto-categorisations should be verified by the user by checking the csv log file.
#'Example inputs are provided within the package data, for the user to run this function in a demo mode.
#'@param json_file The metadata file. This should be downloaded from the metadata catalogue as a json file.
#'@param domain_file The file that lists the domains of interest to be used within the research study, provided as a csv with each domain on a separate line, within quotations.
#'@return The function will return a log file with your mapping between variables and domains, alongside details about the Data Asset.
#'@param domain_file The domain list file. This should be a csv file created by the user, with each domain listed on a separate line within quotation marks.
#'@return The function will return a log file with the mapping between dataset variables and domains, alongside details about the dataset.
#'@examples
#'# Run in demo mode by providing no inputs: domain_mapping()
#'# Demo mode will use the /data files provided in this package
Expand All @@ -17,34 +19,37 @@
#'# Reference the plot tab and categorise each variable into a single ('1') or multiple ('1,2') domain.
#'# Write a note explaining your category choice (optional).
#'@export
#'@importFrom graphics plot.new
#'@importFrom utils read.csv write.csv

domain_mapping <- function(json_file= NULL,domain_file= NULL) {

library(rjson)
library(gridExtra)
library(grid)
library(insight)

# Load data: Check if demo data should be used
# Load data: Check if demo data should be used
if (is.null(json_file) && is.null(domain_file)) {
# If both json_file and domain_file are NULL, use demo data
meta_json <- get('json_metdata')
domains <- get('domains_list')
cat('\nRunning domain_mapping in demo mode using package data files')

DomainListDesc <- 'DemoList'
cat('\n')
cli_alert_info('Running domain_mapping in demo mode using package data files')
} else if (is.null(json_file) || is.null(domain_file)) {
# If only one of json_file and domain_file is NULL, throw error
stop("Please provide both json_file and domain_file (or neither file, to run in demo mode)")
cat('\n')
cli_alert_danger('Please provide both json_file and domain_file (or neither file, to run in demo mode)')
stop()
} else {
# Read in the json file containing the meta data
meta_json <- fromJSON(file = json_file)
meta_json <- rjson::fromJSON(file = json_file)
# Read in the domain file containing the meta data
domains <- read.csv(domain_file,header = FALSE)
DomainListDesc <- tools::file_path_sans_ext(basename(domain_file))
}

# Present domains plots panel for user's reference ----
plot.new()
graphics::plot.new()
domains_extend <- rbind(c('*NO MATCH / UNSURE*'),c('*METADATA*'), c('*ALF ID*'),c('*OTHER ID*'),c('*DEMOGRAPHICS*'),domains)
grid.table(domains_extend[1],cols='Domain',rows=0:(nrow(domains_extend)-1))
gridExtra::grid.table(domains_extend[1],cols='Domain',rows=0:(nrow(domains_extend)-1))

# Get user and demo list info for log file ----
User_Initials <- ""
Expand All @@ -53,40 +58,47 @@ domain_mapping <- function(json_file= NULL,domain_file= NULL) {
User_Initials <- readline(prompt="ENTER INITIALS: ")
}

DomainListDesc <- ""
while (DomainListDesc == "") {
cat("\n \n")
DomainListDesc <- readline(prompt="PROVIDE SOME DESCRIPTION OF DOMAIN LIST USED (version number, created by): ")
}

# Print information about Data Asset ----
print_colour("\nData Asset Name \n",'br_violet')
cli_h1("Data Asset Name")
cat(meta_json$dataModel$label,fill=TRUE)
print_colour("Data Asset Last Updated \n",'br_violet')
cli_h1("Data Asset Last Updated")
cat(meta_json$dataModel$lastUpdated,fill=TRUE)
print_colour("Data Asset Exported \n",'br_violet')
cat("By", meta_json$exportMetadata$exportedBy, "at", meta_json$exportMetadata$exportedOn,fill=TRUE)
cli_h1("Data Asset File Exported By")
cat(meta_json$exportMetadata$exportedBy, "at", meta_json$exportMetadata$exportedOn,fill=TRUE)
nDataClasses <- length(meta_json$dataModel$childDataClasses)
print_colour(sprintf("There are %s Data Classes (tables) in this Data Asset\n\n",nDataClasses),'br_violet')
cat('\n')
cli_alert_info("Found {nDataClasses} Data Class{?es} ({nDataClasses} table{?s}) in this Data Asset")
cat('\n')

dataasset_desc <- ""
while (dataasset_desc != "Y" & dataasset_desc != "N") {
cat("\n \n")
dataasset_desc <- readline(prompt="Would you like to read a description of the Data Asset? (Y/N) ")
}

dataasset_desc <- readline(prompt="Would you like to read a description of the Data Asset? (Y/N) ")
if (dataasset_desc == "Y") {
print_colour("Data Asset Description \n",'br_violet')
cli_h1("Data Asset Description")
cat(meta_json$dataModel$description,fill=TRUE)
readline(prompt="Press [enter] to proceed")
}

# Extract each DataClass (Table)
for (dc in 1:nDataClasses) {
print_colour(sprintf("\n\nProcessing Data Class (Table) %s of %s \n",dc,nDataClasses),'br_violet')
print_colour("\nData Class Name \n",'br_violet')
cat('\n')
cli_alert_info("Processing Data Class (Table) {dc} of {nDataClasses}")
cli_h1("Data Class Name")
cat(meta_json$dataModel$childDataClasses[[dc]]$label,fill=TRUE)
print_colour("Data Class Last Updated\n",'br_violet')
cli_h1("Data Class Last Updated")
cat(meta_json$dataModel$childDataClasses[[dc]]$lastUpdated,'\n',fill=TRUE)

dataclass_desc <- readline(prompt="Would you like to read a description of the Data Class (Table)? (Y/N) ")
dataclass_desc <- ""
while (dataclass_desc != "Y" & dataclass_desc != "N") {
cat("\n \n")
dataclass_desc <- readline(prompt="Would you like to read a description of the Data Class (Table)? (Y/N) ")
}

if (dataclass_desc == "Y") {
print_colour("Data Class Description \n",'br_violet')
cli_h1("Data Class Description")
cat(meta_json$dataModel$childDataClasses[[dc]]$description,fill=TRUE)
readline(prompt="Press [enter] to proceed")
}
Expand Down Expand Up @@ -200,7 +212,9 @@ domain_mapping <- function(json_file= NULL,domain_file= NULL) {
} else {

# user response
cat(paste("\nDATA ELEMENT -----> ",selectDataClass_df$Label[datavar],"\n\nDESCRIPTION -----> ",selectDataClass_df$Description[datavar],"\n\nDATA TYPE -----> ",selectDataClass_df$Type[datavar],"\n"))
cat(paste("\nDATA ELEMENT -----> ",selectDataClass_df$Label[datavar],
"\n\nDESCRIPTION -----> ",selectDataClass_df$Description[datavar],
"\n\nDATA TYPE -----> ",selectDataClass_df$Type[datavar],"\n"))

decision <- ""
while (decision == "") {
Expand Down Expand Up @@ -233,11 +247,15 @@ domain_mapping <- function(json_file= NULL,domain_file= NULL) {

# Save file & print the responses to be saved
Output[Output == ''] <- NA
write.csv(Output, output_fname, row.names=FALSE) #save as we go in case session terminates prematurely
cat("\n \n The below responses will be saved to", output_fname,"\n \n")
utils::write.csv(Output, output_fname, row.names=FALSE) #save as we go in case session terminates prematurely
cat("\n")
cli_alert_info("The below responses will be saved to {output_fname}")
cat("\n")
print(Output[,c("DataClass","DataElement","Domain_code","Note")])
}

print_colour("\n\nPlease check the auto categorised data elements are accurate!\nManually edit csv file to correct errors, if needed.\n",'bg_yellow')
cat("\n \n")
cli_alert_warning("Please check the auto categorised data elements are accurate!")
cli_alert_warning("Manually edit csv file to correct errors, if needed.")
}

33 changes: 23 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,12 @@ There are many existing tools that allow you to browse metadata for health datas

### Install

Download/clone this repository to your computer.
Run in the R console:

Install required packages: `devtools`, `gridExtra`, `grid`, `insight`, `rjson`

Then in the R console run:

`library(devtools)`

`load_all("/path-to-repo/browseMetadata")`

`library('browseMetadata')`
```r
install.packages("devtools")
devtools::install_github("aim-rsf/browseMetadata")
```

### Example run through
Execute `?domain_mapping` in the R console to read the documentation.
Expand Down Expand Up @@ -88,6 +83,24 @@ To build the documentation files:
`library(roxygen2)`
`roxygenise()`

## Citation

To cite package ‘browseMetadata’ in publications use:

> Stickland R (2024). browseMetadata: Browses available metadata, to catergorise/label each variable in a dataset. R package version 0.1.0.
A BibTeX entry for LaTeX users is

```
@Manual{,
title = {browseMetadata: Browses available metadata, to catergorise/label each variable in a dataset},
author = {Rachael Stickland},
year = {2024},
note = {R package version 0.1.0},
}
```


### Contributors ✨
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification, using the ([emoji key](https://allcontributors.org/docs/en/emoji-key)). Contributions of any kind welcome!

Expand Down
15 changes: 15 additions & 0 deletions man/browseMetadata-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 8 additions & 6 deletions man/domain_mapping.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 42dcd30

Please sign in to comment.