具体请参考文章《分分钟带你杀入Kaggle Top 1%》
Every directory starts with "Stage" has three sub-directory, Code, Input and Output .
Stage0: data preprocessing
- Input: Contain two files, "train.csv" renamed by raw train file, "test.csv" renamed by raw test file.
- Code: data_process.py for different process method.
- Output:
Stage1:
- Input: Output from Stage0
- Code: handcraft feature and deep learning feature extraction
- Output:
Stage2:
- Input: Output from Stage1
- Code: unlinear ensemble, such as LightGBM, RandomForest.
- Output:
Stage3:
- Input: Output from Stage2
- Code: Ensemble Selection
- Output: final result.
The code is dirty and leaky. Just for reading.