Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conditioning vars are not found #90

Open
schloerke opened this issue Mar 31, 2016 · 4 comments
Open

conditioning vars are not found #90

schloerke opened this issue Mar 31, 2016 · 4 comments

Comments

@schloerke
Copy link
Contributor

If I divide by A and B, then want to re-group to just A, the columns aren't found as they are conditioning vars. This is very confusing to me.

example:

mt <- mtcars
mt$gear <- paste("Gear_", mt$gear, sep = "")
mt$carb <- paste("Carb_", mt$carb, sep = "")

byGearCarb <- divide(
  mt,
  by = c("gear", "carb")
)

byGear <- divide(
  byGearCarb,
  by = c("gear")
)
# * Verifying parameters...
# Error in validateDivSpec.condDiv(by, data, ex) : 
#   'by' variables for conditioning division are not matched in data.  Look at a subset of the data, e.g. 'data[[1]]' to see what to expect.

byGearCarb
# 
# Distributed data frame backed by 'kvMemory' connection
# 
#  attribute      | value
# ----------------+---------------------------------------------------------------------------------------------------------------
#  names          | mpg(num), cyl(num), disp(num), hp(num), drat(num), wt(num), qsec(num), vs(num), am(num)
#  nrow           | 32
#  size (stored)  | 35.45 KB
#  size (object)  | 35.45 KB
#  # subsets      | 11
# 
# * Other attributes: getKeys()
# * Missing attributes: splitSizeDistn, splitRowDistn, summary
# * Conditioning variables: gear, carb

byGearCarb[[1]]
# $key
# [1] "gear=Gear_3|carb=Carb_1"
# 
# $value
#    mpg cyl  disp  hp drat    wt  qsec vs am
#1 21.4   6 258.0 110 3.08 3.215 19.44  1  0
#2 18.1   6 225.0 105 2.76 3.460 20.22  1  0
#3 21.5   4 120.1  97 3.70 2.465 20.01  1  0

Can this be fixed by a preTransFn? Should this default to add the conditioning vars?

@schloerke
Copy link
Contributor Author

In other words.... How do I get my keys back or never lose them?

@hafen
Copy link
Contributor

hafen commented Mar 31, 2016

See flatten() and getSplitVars().

mt <- mtcars
mt$gear <- paste("Gear_", mt$gear, sep = "")
mt$carb <- paste("Carb_", mt$carb, sep = "")

byGearCarb <- divide(
  mt,
  by = c("gear", "carb")
)

byGear <- byGearCarb %>%
  addTransform(flatten) %>%
  divide(by = "gear")

@hafen
Copy link
Contributor

hafen commented Mar 31, 2016

I'm going to leave this open though because it actually should make the conditioning variables available for any operation performed on the data. I don't know why it is not in this case - I'll look into it.

Conditioning variables are omitted as columns and stored as attributes to save space.

@schloerke
Copy link
Contributor Author

By making preTransFn = flatten, my error seems to be resolved...

byGear <- divide(
  byGearCarb,
  by = c("gear"),
  preTransFn = flatten
)
# * Verifying parameters...
# ** note **: preTransFn is deprecated - please apply this transformation using 'addTransform()' to your input data prior to calling 'divide()'
# *** finding global variables used in 'fn'... [none]
#   package dependencies: datadr
# *** testing 'fn' on a subset... ok
# * Applying division...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants