This repository contains the code that allows to reproduce the results of the paper "FIPE: Functionnaly Identical Prunning Ensemble".
The code is written in Python 3.10. To install the required packages, run the following command:
pip install -r requirements.txt
To run the experiments, make sure you have a valid Gurobi license and the required libraries installed. Then, from the current folder, execute the following command:
python run.py </path/to/dataset1> <path/to/dataset2> ... <path/to/datasetn> </path/to/output> --ensemble <ensemble> --n-estimators <n1> <n2> ... <nk> --seeds <seed1> <seed2> ... <seedn> --norm <norm>
where:
</path/to/dataset1>
,</path/to/dataset2>
, ...,</path/to/datasetn>
are the paths to the datasets to use.</path/to/output>
is the path to the output folder.<ensemble>
is the ensemble method to use. It can be one of the following:ab
forAdaBoostClassifier
rf
forRandomForestClassifier
gb
forGradientBoostingClassifier
.lgbm
forLGBMClassifier
.xgb
forXGBClassifier
.
<n1>
,<n2>
, ...,<nk>
are the number of estimators to use for the ensemble method.<seed1> <seed2> ... <seedn>
are the seeds to use for the random number generator.<norm>
is the norm to use for theFIPE
algorithm. It can be one of the following:0
forL0 norm
or1
forL1 norm
.
The output folder will contain the experiment results in CSV format inside a subfolder named csvs
. The results can be merged into a single CSV file by running the following command:
python agg.py </path/to/csv1> </path/to/csv2> ... </path/to/csvn> </path/to/output>
where </path/to/csv1> </path/to/csv2> ... </path/to/csvn>
are the paths to the CSV files to merge and </path/to/output>
is the path to the output folder.
python run.py datasets/* outputs/ --ensemble ab --n-estimators 50 100 --seeds 34 42 --norm 1
To merge the results of the experiments, use the following command:
python agg.py outputs/csvs/* outputs/results.csv
In the output folder, you will have multiple files. To clean the output folder, you can use the following command:
find outputs -mindepth 1 ! -regex '^outputs/csvs$' -delete