You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am experiencing an unexpected behaviour from the remove_empty() function, more specifically with the cutoff argument.
The cutoff argument is defined as What fraction (>0 to <=1) of rows or columns must be empty to be removed?. Therefore, when I set cutoff = 0.8 I would expect that rows that are missing 80% or more rows will be removed. However, I think the function is keeping the rows with 80% or more non-empty data, instead of removing the rows with 80% or more empty data. Take the following example:
## Create data framedata<-data.frame(
x=1:5, # missing 0% of rowsy= c(2, NA, NA, NA, 3), # missing 60% of rowsz= rep(NA, 5) # missing 100% of rows
)
## Remove columns that are completely empty (cutoff = 1)## - Removes `y`, so it works as expectedjanitor::remove_empty(
data=data,
which="cols"
)
## Remove columns which are missing 80% or more rows## - Removes `y` and `z`. However, `z` shoulnd't be removedjanitor::remove_empty(
dat=data,
which="cols",
cutoff=.8
)
## Remove columns which are missing 20% or more rows## - Removes only `y`. However, `z` should be removedjanitor::remove_empty(
dat=data,
which="cols",
cutoff=.2
)
The text was updated successfully, but these errors were encountered:
Thanks for bringing this up. We will be including this as a documentation fix in the next version of janitor. This concern is a duplicate of #568. Please also see the proposed documentation in #569; if that is unclear, please comment in either the issue or the PR.
Bug reports
Hello, I am experiencing an unexpected behaviour from the
remove_empty()
function, more specifically with thecutoff
argument.The
cutoff
argument is defined as What fraction (>0 to <=1) of rows or columns must be empty to be removed?. Therefore, when I setcutoff = 0.8
I would expect that rows that are missing 80% or more rows will be removed. However, I think the function is keeping the rows with 80% or more non-empty data, instead of removing the rows with 80% or more empty data. Take the following example:The text was updated successfully, but these errors were encountered: