You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the transformers-based model de_dep_news_trf I get a huggingface/tokenizers warning message in the console:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
It appears that the underlying parallelism issue in huggingface/transformers might be getting in the foreseeable future, as there was a commit that seems to address this issue just this week.
However, for the time until this will be released in transformers and spaCy, is there a way to set the mentioned environment variable when using spaCy through spacyr? The warning is printed in the console repeatedly until the R session is restarted, which is a nuisance. Setting spacy_tokenize(x, multithread = FALSE) does not influence the warning.
Click for details and instructions on how to reproduce the warning
The warning message appears when using spacy_tokenize(x, what = "sentence"), but does not show up when using what = "words". The message is printed as black text like console output, not as blue text like normal R warnings.
The message seems to be printed again and again repeatedly, but not very frequently, maybe once a minute. The message keeps appearing after I've called spacy_finalize(). Only restarting the R session stops the warning. Setting the multithread argument in spacy_tokenize does not influence whether the warning appears.
I can consistently reproduce the warning by executing the following code and then saving the code file in RStudio (the warning only appears on saving).
library(spacyr)
text_taxi <- "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern. Franz jagt im komplett verwahrlosten Taxi."
spacy_initialize(model = "de_dep_news_trf")
spacy_tokenize(text_taxi,
what = "sentence",
multithread = FALSE,
output = "data.frame")[,2]
spacy_finalize()
# Now save the code file
The text was updated successfully, but these errors were encountered:
When using the transformers-based model
de_dep_news_trf
I get a huggingface/tokenizers warning message in the console:It appears that the underlying parallelism issue in huggingface/transformers might be getting in the foreseeable future, as there was a commit that seems to address this issue just this week.
However, for the time until this will be released in transformers and spaCy, is there a way to set the mentioned environment variable when using spaCy through spacyr? The warning is printed in the console repeatedly until the R session is restarted, which is a nuisance. Setting
spacy_tokenize(x, multithread = FALSE)
does not influence the warning.Click for details and instructions on how to reproduce the warning
The warning message appears when using
spacy_tokenize(x, what = "sentence")
, but does not show up when usingwhat = "words"
. The message is printed as black text like console output, not as blue text like normal R warnings.The message seems to be printed again and again repeatedly, but not very frequently, maybe once a minute. The message keeps appearing after I've called
spacy_finalize()
. Only restarting the R session stops the warning. Setting the multithread argument in spacy_tokenize does not influence whether the warning appears.I can consistently reproduce the warning by executing the following code and then saving the code file in RStudio (the warning only appears on saving).
The text was updated successfully, but these errors were encountered: