Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_row_masks() doesn't respect descending sorting of row break variables #50

Open
mstackhouse opened this issue May 16, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@mstackhouse
Copy link
Contributor

mstackhouse commented May 16, 2022

Prerequisites

For more information, see the CONTRIBUTING guide.

Description

apply_row_masks() is intended to be used post-sorting, as it blanks row labels out assuming that the repetitive values are already next to each other. Currently, if using additional break variables to insert row breaks, this fails to consider if one of the break variables has been sorted in descending order, leading to an unexpected result.

Steps to Reproduce (Bug Report Only)

adae <- haven::read_xpt(url("https://github.com/phuse-org/TestDataFactory/raw/main/Updated/TDF_ADaM/adae.xpt")) %>% 
  filter(AEBODSYS %in% c("IMMUNE SYSTEM DISORDERS", "CONGENITAL, FAMILIAL AND GENETIC DISORDERS"))

# Create the Tplyr table object - this is like a specification for the table
t <- tplyr_table(adae, TRTA) %>% 
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD)) %>% 
      set_distinct_by(USUBJID) %>% 
      set_order_count_method("bycount", break_ties='desc') %>%
      set_ordering_cols("Xanomeline High Dose") %>%
      set_result_order_var(distinct_n)
  )

# Now build the table - this is where the number crunching happens
ae1 <- t %>% 
  build() %>% 
  arrange(desc(ord_layer_1), desc(ord_layer_2)) %>% 
  apply_row_masks(row_breaks=TRUE, ord_layer_1)

Expected behavior: [What you expected to happen]

CONGENITAL, FAMILIAL AND GENETIC DISORDERS should sort first, and IMMUNE SYSTEM DISORDERS should sort second.

Actual behavior: [What actually happened]

IMMUNE SYSTEM DISORDERS sorts first and CONGENITAL, FAMILIAL AND GENETIC DISORDERS sorts second. Ascending sorting is reimplemented for ord_layer_1.

** Workaround **

The desired sort order can be achieved by re-sorting the data with the newly added ord_break variable.

ae1 %>% 
  arrange(desc(ord_layer_1), desc(ord_layer_2)) %>% 
  apply_row_masks(row_breaks = TRUE, ord_layer_index, ord_layer_1) %>% 
  arrange(desc(ord_layer_1), ord_break, desc(ord_layer_2))

This is far from desirable, but it works.

@mstackhouse mstackhouse added the bug Something isn't working label May 16, 2022
@mstackhouse mstackhouse self-assigned this May 16, 2022
@mstackhouse
Copy link
Contributor Author

One way to resolve this would be to capture desc(ord_layer_1) in the call to apply_row_masks() and use that when the row breaks are inserted:

apply_row_masks(row_breaks = TRUE, ord_layer_index, desc(ord_layer_1))

apply_row_masks() has to re-sort the data after the blanked out rows are inserted, as there are new rows injected into the data that need to be ordered properly. So if the ordering is provided, this could be respected.

If there's a stacking method that could respect variable grouping, this could be avoided - but I'm not seeing that in the tidyverse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant