-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combined table output #144
Conversation
rename context-related columns to `treatment_context_properties`, not `treatment_contexts` to align with how information is coded into the ontology
Realised we need to maintain identifiers for output to still match our OBOE-adherent data model. They can vanish for individual join functions, but not for one of the official traits.build outputs
- add in contributors - ensure the symbols we are using for collapsing data aren't in the data itself
Not sure if this is the correct place
…o/traits.build into combined-table-output
…o/traits.build into combined-table-output
…o/traits.build into combined-table-output
Now I see where `austraits` is being used - presumably this means the function `join_contexts` also needs to be moved to `traits.build`? @dfalster
This is failing because I've using an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still having a look at the final output, but submitting some comments for now.
R/output_combined_table.R
Outdated
#' | ||
#' @param database A traits.build database | ||
#' | ||
#' @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs documentation
dplyr::mutate(location_properties = paste0(location_property, "=", value)) %>% | ||
dplyr::select(dplyr::all_of(c("dataset_id", "location_id", "location_name", "location_properties"))) %>% | ||
dplyr::group_by(dataset_id, location_id, location_name) %>% | ||
dplyr::mutate(location_properties = paste0(location_properties, collapse = "; ")) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would dplyr::summarise
here work the same and replace the need for distinct()
later? Same with later tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly but, I think distinct()
is actually clearer in this case - otherwise you'd just have to list all the column individually and decide how to treat each one (lots of "take first value")
@@ -1176,6 +1176,20 @@ dataset_test_worker <- | |||
info = paste0(red(dataset_id), "\t`db_traits_pivot_longer` threw a warning") | |||
) | |||
} | |||
|
|||
expect_no_error( | |||
combined_table <- database_create_combined_table(dataset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests creating a combined table for a single dataset, which may very well be intended.
Will we also add database_create_combined_table
to build.R or to the GitHub actions list for AusTraits?
R/output_combined_table.R
Outdated
database$traits %>% | ||
dplyr::left_join(location_latlon, by = c("dataset_id", "location_id")) %>% | ||
dplyr::left_join(location_properties, by = c("dataset_id", "location_id", "location_name")) %>% | ||
austraits::join_contexts(contexts_tmp) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you define a function called join_contexts
above? Why do we use the austraits version here?
R/output_combined_table.R
Outdated
data_collectors = ifelse( | ||
is.na(ORCID), | ||
paste0(data_collectors, " <affiliation:", affiliation), | ||
paste0(data_collectors, ";affiliation:", affiliation)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add spaces after the semicolons and use equal (=) signs instead of colons, consistent with location_properties
? It might be more readable.
description = stringr::str_replace_all(description, "=", "-"), | ||
value = ifelse( | ||
is.na(description), | ||
paste0(context_property, ":", value), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here I wonder if "=" would be more readable than ":"
out <- contexts_table %>% | ||
dplyr::filter(link_id == context_id) %>% | ||
dplyr::select(-link_id) %>% | ||
dplyr::distinct(dataset_id, link_vals, .keep_all = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line necessary? Shouldn't it be distinct anyway, otherwise there's something wrong with the contexts table that we want to be picked up by dataset_test
?
) | ||
} | ||
|
||
combined_table <- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the ID columns that help join to other relational tables could be removed from the combined table?
context_property = stringr::str_replace_all(context_property, "=", "-"), | ||
value = stringr::str_replace_all(value, "=", "-"), | ||
description = stringr::str_replace_all(description, "=", "-"), | ||
context_property = stringr::str_replace_all(context_property, ";", ","), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the characters "<" and ">" also need to be replaced in case they ever make it into the context_property
, value
and description
fields?
I've had a look at the final table. I think it looks good overall! |
@ehwenk Also need to update documentation (export the new function) with |
revert recent updates to see if other reasons why runs failing
remove example
This is now considered a stale branch and will shortly be deleted. The work on this function is now occurring at https://github.com/traitecoevo/austraits/tree/combined_table-functions. The core function, |
A function to generate a combined table output - trait data are still in long format, with all information from relational tables merged as additional, condensed columns.
Requires review and plan for how this will be integrated into workflow. For now I'm assuming it will be a function that can be sourced and run to generate outputs for releases.
@yangsophieee @dfalster It would be great if both of you could build the combined table and look at syntax, packing "rules".