You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 9, 2024. It is now read-only.
Apparently plasticc no longer successfully completes when it is ran on the original plasticc dataset instead of the synthetic. This is true for both HDK version 0.6 and 0.7. On some systems execution ends silently, on some systems there is an error message [2023-07-13 17:34:49.912154] [0x00005874] [info] 0 71 BufferMgr.cpp:720 Check failed: buffer_it->second->buffer. Debugging shows that it happens on the line that triggers HDK execution df_meta.shape # to trigger real execution.
To execute benchmark on the original dataset download test_set.csv, training_set.csv, test_set_metadata.csv and training_set_metadata.csv from modin datasets s3 bucket s3://modin-datasets/plasticc. You can execute benchmarks/plasticc.py directly like this:
set MODIN_STORAGE_FORMAT=hdk
set MODIN_ENGINE=native
set MODIN_EXPERIMENTAL=true
python benchmarks/plasticc.py training_set.csv test_set.csv training_set_metadata.csv test_set_metadata.csv
or you can rename these files into plasticc_training_set.csv, plasticc_test_set.csv, plasticc_training_set_metadata.csv and plasticc_test_set_metadata.csv respectively and running launcher.py with option -ru (reuse):
python launcher.py -m plasticc -ru --hdk
With -ru launcher skips generation stage and reuses dataset files already present in current directory.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Apparently plasticc no longer successfully completes when it is ran on the original plasticc dataset instead of the synthetic. This is true for both HDK version 0.6 and 0.7. On some systems execution ends silently, on some systems there is an error message
[2023-07-13 17:34:49.912154] [0x00005874] [info] 0 71 BufferMgr.cpp:720 Check failed: buffer_it->second->buffer
. Debugging shows that it happens on the line that triggers HDK executiondf_meta.shape # to trigger real execution
.You can reproduce the problem by checking out the benchmarks repo https://github.com/gshimansky/data-science-processing-workload.
To execute benchmark on the original dataset download
test_set.csv
,training_set.csv
,test_set_metadata.csv
andtraining_set_metadata.csv
from modin datasets s3 buckets3://modin-datasets/plasticc
. You can executebenchmarks/plasticc.py
directly like this:or you can rename these files into
plasticc_training_set.csv
,plasticc_test_set.csv
,plasticc_training_set_metadata.csv
andplasticc_test_set_metadata.csv
respectively and running launcher.py with option-ru
(reuse):With
-ru
launcher skips generation stage and reuses dataset files already present in current directory.The text was updated successfully, but these errors were encountered: