Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in extracting a subset of a ddf #97

Open
talegari opened this issue Jan 29, 2017 · 2 comments
Open

Problem in extracting a subset of a ddf #97

talegari opened this issue Jan 29, 2017 · 2 comments

Comments

@talegari
Copy link

Hi,

I am trying to extract a subset of a ddf(local disk based).

library("datadr")
write.table(mtcars, "mtcars.csv", sep = ",", row.names = FALSE)
lc <- localDiskConn(file.path(getwd(), "test"), autoYes = TRUE)
drRead.table("mtcars.csv"
             , header = TRUE
             , sep = ","
             , output = lc
             , rowsPerBlock = 7)
mtcars_ddf <- ddf(lc, update = TRUE)
drSubset(mtcars_ddf, gear == 2)

The last line throws an error:

* Testing 'subset' on a subset
Error in names(tmp) <- c("key", "value") : 
  'names' attribute [2] must be the same length as the vector [1]

Same error if the last line is replaced by: drSubset(mtcars_ddf, c(rep(TRUE, 10), rep(FALSE, 22)))
Is this the intended usage of drSubset or am I missing something?

@hafen
Copy link
Contributor

hafen commented Feb 17, 2017

Sorry for the delay. This is interesting. What version of datadr and R do you have? I don't get an error, but a row of all NA (which isn't right either - it should be an empty data frame).

The other test case you provide, where you provide a vector of logicals, doesn't work in this setting because a ddf doesn't keep track of the row ordering of the original data frame.

The intended usage of drSubset() is a bit limiting because it's intended to be used when you want to grab an in-memory data frame subset of a ddf, which could potentially be very large.

If you want to filter out records but still have a ddf, you can simply add a transformation function to a ddf that does the filtering (using whatever mechanism you'd like - subset, dplyr, etc.).

@talegari
Copy link
Author

I am now getting a row of NA's now, an empty dataframe is preferable though. This issue may be closed, if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants