-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sort_by.data.table #6679
base: master
Are you sure you want to change the base?
add sort_by.data.table #6679
Conversation
Generated via commit fe8f6f1 Download link for the artifact containing the test results: ↓ atime-results.zip
|
if (inherits(y, "formula")) | ||
y <- .formula2varlist(y, x) | ||
if (!is.list(y)) | ||
y <- list(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a few more tests -- currently all cases use the formula interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some tests for other interfaces
https://github.com/Rdatatable/data.table/blob/bc3a413d1912340f93b1b50004a04a3b1ca2b485/inst/tests/tests.Rraw#L20711-L20725
Some questions arose to me
sort_by.data.frame
base method uses.formula2varlist()
, which is a public but "internal" base package, that as the manpage says,
most of which are only user-visible because of the special nature of the base namespace.
we can use that in sort_by.data.table
right?
- the list columns. I included a test in which the sort column is a list column. As
forder
treats a list like a dt itself, the result may have the same number of rows as elements each vector of the sorting list. Nevertheless this is the same behaviour as dt[order(x)] where x is a list column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use that in sort_by.data.table right?
Hmm, thanks for flagging. Actually, I'm not concerned so much about that, but you did make me remember that it's a pretty new {base} function. It won't be available in all the R versions we support --> will cause an R CMD check
NOTE
there.
And actually, we've crossed this bridge before in #5393:
Lines 2460 to 2465 in d263924
# same as split.data.frame - handling all exceptions, factor orders etc, in a single stream of processing was a nightmare in factor and drop consistency | |
# evaluate formula mirroring split.data.frame #5392. Mimics base::.formula2varlist. | |
if (inherits(f, "formula")) | |
f <- eval(attr(terms(f), "variables"), x, environment(f)) | |
# be sure to use x[ind, , drop = FALSE], not x[ind], in case downstream methods don't follow the same subsetting semantics (#5365) | |
return(lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), function(ind) x[ind, , drop = FALSE])) |
So actually, we should just make a local copy of .formula2varlist()
to re-use between split.data.table
and sort_by.data.table
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the list columns. I included a test in which the sort column is a list column. As forder treats a list like a dt itself, the result may have the same number of rows as elements each vector of the sorting list. Nevertheless this is the same behaviour as dt[order(x)] where x is a list column.
As long as the behavior is consistent with [order(...)]
I think we can ignore it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think it needs an entry in man/, probably in ?forder
would be the most logical place?
Added a reference in setorder.Rd (?forder). PTAL. |
Proposal for #6662
sort_by.data.table()
will sort using C-locale now, this is incompatible withbase::sort_by.data.frame()
used on data.tables