[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

gshimansky · 2023-07-13T22:48:34Z

Apparently plasticc no longer successfully completes when it is ran on the original plasticc dataset instead of the synthetic. This is true for both HDK version 0.6 and 0.7. On some systems execution ends silently, on some systems there is an error message
[2023-07-13 17:34:49.912154] [0x00005874] [info] 0 71 BufferMgr.cpp:720 Check failed: buffer_it->second->buffer. Debugging shows that it happens on the line that triggers HDK execution df_meta.shape # to trigger real execution.

You can reproduce the problem by checking out the benchmarks repo https://github.com/gshimansky/data-science-processing-workload.

To execute benchmark on the original dataset download test_set.csv, training_set.csv, test_set_metadata.csv and training_set_metadata.csv from modin datasets s3 bucket s3://modin-datasets/plasticc. You can execute benchmarks/plasticc.py directly like this:

set MODIN_STORAGE_FORMAT=hdk
set MODIN_ENGINE=native
set MODIN_EXPERIMENTAL=true
python benchmarks/plasticc.py training_set.csv test_set.csv training_set_metadata.csv test_set_metadata.csv

or you can rename these files into plasticc_training_set.csv, plasticc_test_set.csv, plasticc_training_set_metadata.csv and plasticc_test_set_metadata.csv respectively and running launcher.py with option -ru (reuse):

python launcher.py -m plasticc -ru --hdk

With -ru launcher skips generation stage and reuses dataset files already present in current directory.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

gshimansky commented Jul 13, 2023

[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

Comments

gshimansky commented Jul 13, 2023