-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add step_glove() #102
Comments
@EmilHvitfeldt The slightly hairy part of this is that |
Yes I kinda took a break at these issues because the total amount of parameters is going to be huge since I'm thinking the best way forward might be to do something like step_glove <- function(recipe,
...,
role = "predictor",
trained = FALSE,
columns = NULL,
tokenizer = text2vec::space_tokenizer,
dim = 10L,
window = 5L,
min_count = 5L,
n_iter = 10L,
x_max = 10L,
stopwords = character(),
aggregation = c("sum", "mean", "min", "max"),
aggregation_default = 0,
prefix = "glove",
skip = FALSE,
id = rand_id("glove")) { But on the other hand, it would be nice if we could eliminate the It appears that spacing isn't going to be an issue when stopwords are removed. library(wordsalad)
set.seed(1)
x <- glove(fairy_tales, x_max = 5, stopwords = "hello")
set.seed(1)
hello_fairy_tales <- stringr::str_replace_all(fairy_tales, " ", " hello ")
x_hello <- glove(hello_fairy_tales, x_max = 5, stopwords = "hello")
identical(x, x_hello)
#> [1] TRUE Created on 2021-02-05 by the reprex package (v0.3.0) If you feel up for it, I think having |
That makes sense to me. This is what I'm looking at for the arguments, then:
I went with "glove_embed" for the prefix so we could (for example) easily select all "_embed" predictors. I don't have a specific case where that would be necessary, but I can imagine maybe it being a thing. Now I need to walk through the logic and figure out which functions/methods we need to write especially for this. I think we can just have the main |
Er wait, nope, because the selector hasn't done its job until deeper in, so we won't know what to call |
use
glove()
from https://github.com/EmilHvitfeldt/wordsaladThe text was updated successfully, but these errors were encountered: