-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMachineLearning_SumUp.mm
270 lines (254 loc) · 56.8 KB
/
MachineLearning_SumUp.mm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
<map version="1.0.0"><node Text="Carte heuristique"><node ID="C5CDABE6-D9BD-4641-9204-9DC1672E26B7" BACKGROUND_COLOR="#FFFFFF" TEXT="Data" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica" SIZE="24" /><node ID="DC3DE5D8-EFEC-4E1B-9A9E-44D2FD955613" BACKGROUND_COLOR="#FFFFFF" TEXT="Having redundant (or highly correlated) columns can be a problem for machine learning algorithms
pd.crosstab(index=series1, columns=series2)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
<node ID="8DE5E64E-8A78-48F3-B342-B5683EAA681A" BACKGROUND_COLOR="#FFFFFF" TEXT="How to handle non-linear data" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="3EE3269E-467D-4830-91BE-826990C2159B" BACKGROUND_COLOR="#FFFFFF" TEXT="choose a model that can natively deal with non-linearity" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="5" /><font NAME="Helvetica" SIZE="20" /></node>
<node ID="1A7068E9-A21B-4A0C-8D7F-808B55834610" BACKGROUND_COLOR="#FFFFFF" TEXT="Engineer a richer set of features by including expert knowledge
See : sklearn.preprocessing.PolynomialFeatures" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="5" /><font NAME="Helvetica" SIZE="20" /></node>
<node ID="B9124264-0802-42C9-98DF-B26CDF2AA895" BACKGROUND_COLOR="#FFFFFF" TEXT="Use a "kernel" to have a locally-based decision function instead of a global linear decision function" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="5" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
<node ID="6748EBBC-BE02-432A-9FD1-D7F6A1898663" BACKGROUND_COLOR="#FFFFFF" TEXT="Class imbalance problem" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="1A54A115-9DAB-47ED-B778-06DB27DDA981" BACKGROUND_COLOR="#FFFFFF" TEXT="- Accuracy should not be used
- Might be best to use the precision and recall
- Or the balanced accuracy score" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="5" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="79E5D296-63A4-4E5C-AD01-C67A22B6153E" BACKGROUND_COLOR="#FFFFFF" TEXT="Models" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica" SIZE="24" /><node ID="41B57983-7FC8-403D-BA3A-DDE9F35AE7DA" BACKGROUND_COLOR="#FFFFFF" TEXT="Dummy :
Makes predictions using simple rules (set with strategy). Do not use it for real problems." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="07D346C1-C8EA-424E-A7D2-CE3A60DA5945" BACKGROUND_COLOR="#FFFFFF" TEXT="K-nearest neighbors :
It takes into account its k closest samples in the training set and predicts the majority target of these samples." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="5" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="6DE4F2E8-7DA0-4D3D-96E0-7221916EA185" BACKGROUND_COLOR="#FFFFFF" TEXT="SVM :
It can make a linear model more expressive by the use of a "kernel". Instead of learning a weight per feature, a weight will be assign by sample instead. However, not all samples will be used. This is the base of the support vector machine algorithm.
Kernel methods such as SVR are very efficient for small to medium datasets" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="301540FC-9581-4ABF-A5C9-AA3DF43937E1" BACKGROUND_COLOR="#FFFFFF" TEXT="Decision Tree :
Hierachical construction based on one variable at a time. Each time, for one given data point, we have two possible outcomes.
Leaf Node = prediction for a specific datapoint" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="3" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /></node>
</node>
</node>
</node>
</node>
<node ID="107FB7A2-CC68-4C72-963D-58AC59BD3FA4" BACKGROUND_COLOR="#FFFFFF" TEXT="Curves" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica" SIZE="24" /><node ID="9527DB80-0CEF-481E-88E2-5B812E65AF60" BACKGROUND_COLOR="#FFFFFF" TEXT="Validation Curve" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="F0B4B208-75CE-4B00-8399-306A4E50741F" BACKGROUND_COLOR="#FFFFFF" TEXT="Evolution of the training and testing error VS hyperparameter tuning" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="5" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
<node ID="26B3F594-80F2-4DB8-AEBB-45B1D18DCD98" BACKGROUND_COLOR="#FFFFFF" TEXT="Learning Curve" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="A7B72244-C5C8-477F-A23A-CBD66A58349E" BACKGROUND_COLOR="#FFFFFF" TEXT="Evolution of the training and testing error VS the number of training samples" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="5" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="20AA4F29-C655-4CC7-A538-D3B291ED5EF9" BACKGROUND_COLOR="#FFFFFF" TEXT="FINE TUNING" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="36A5A938-2CB8-41DD-B050-A4EC19432930" BACKGROUND_COLOR="#FFFFFF" TEXT="GridSearchCV" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="6E41E6C9-71C5-4334-B715-23AC8BA4FC87" BACKGROUND_COLOR="#FFFFFF" TEXT="The generalization performance of each parameter set is evaluated by using an internal cross-validation procedure.
When calling fit(X, y) on a grid-search, X and y will be split by a cross-validation strategy.
In a K-fold cross-validation, X and y are divided into K folds and K models will be trained on K-1 folds and tested on the remaining fold. Each model will have used different test fold. The test scores are then averaged for the 10 models.
This operation is repeated for all combinations of hyperparameters. The combination of hyperparameter values with the best average cross-validation score is selected." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="180DC8B4-7E0C-450A-A828-508E56708119" BACKGROUND_COLOR="#FFFFFF" TEXT="The parameters need to be specified explicitly." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="C010D840-59F5-42E3-BA9F-E1B4EE2C2418" BACKGROUND_COLOR="#FFFFFF" TEXT="RandomizedSearchCV" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#0076A8" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="A60CCE94-3D99-402F-9B37-5EAEED2CEE99" BACKGROUND_COLOR="#FFFFFF" TEXT="With a grid-search, the danger is that the region of good hyperparameters fall between the line of the grid." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#0076A8" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="BA3BCB30-7BDA-41EB-9E55-031BA93371B6" BACKGROUND_COLOR="#FFFFFF" TEXT="RandomizedSearchCV is a stochastic search who will sample hyperparameter 1 independently from hyperparameter 2 (etc..) and find the optimal region.
Randomly generates the parameter candidates which avoids the regularity of the grid and increases the resolution in each direction (frequent situation where the choice of some hyperparameters is not very important).
It also alleviates the regularity imposed by the grid that might be problematic sometimes." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#0076A8" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="BBF5AD08-991D-41E5-96D1-A2344807CFDA" BACKGROUND_COLOR="#FFFFFF" TEXT="Typically used to optimize 3 or more hyperparameters" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#0076A8" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
<node ID="C864E410-42C1-474E-B2A1-A6F6C0E0A64D" BACKGROUND_COLOR="#EAEAEA" TEXT="When we evaluate a family of models on test data and pick the best performer, we can not trust the corresponding prediction accuracy, and we need to apply the selected model to new data. Indeed, the test data has been used to select the model, and it is thus no longer independent from this model." COLOR="#595959" POSITION="left" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
<node ID="CBDADC67-8F6C-45EA-A448-9B414B425378" BACKGROUND_COLOR="#FFFFFF" TEXT="DATA PREPROCESSING" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="38244575-614B-46B7-B881-811EBC61D322" BACKGROUND_COLOR="#000000" TEXT="Scalling features" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="77D12A4D-6A7D-4642-86A8-0C050BC60C0C" BACKGROUND_COLOR="#FFFFFF" TEXT=" When / Why ?
Linear models such as logistic regression generally benefit from scaling the features while other models such as decision trees do not need such preprocessing (but will not suffer from it).
Models that rely on the distance between a pair of samples, for instance KNN, should be trained on normalized features to make each feature contribute approximately equally to the distance computations.
Many models such as logistic regression use a numerical solver (based on gradient descent) to find their optimal parameters. This solver converges faster when the features are scaled." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="067D9577-FD3E-40A7-87F0-2C9CF93F10A2" BACKGROUND_COLOR="#FFFFFF" TEXT="Working with non-scaled data will potentially force the algorithm to iterate more." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="646B844F-2EE1-48B4-B468-C4F97309A6C4" BACKGROUND_COLOR="#000000" TEXT="Encoding data" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="F4F9B56F-47EC-49E6-A883-D45BEAE6CD57" BACKGROUND_COLOR="#FFFFFF" TEXT="Ordinal Data
OrdinalEncoder" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="EBA3AFD4-99E1-4BF7-A999-AE73BEA762E4" BACKGROUND_COLOR="#FFFFFF" TEXT="For encoding ordinal categories
Encode each category with a different number.
Using this integer representation leads downstream predictive models to assume that the values are ordered (0 < 1 < 2 < 3...)." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="C9B5904E-01FC-4EF9-A901-044799B15D37" BACKGROUND_COLOR="#FFFFFF" TEXT="By default, OrdinalEncoder uses a lexicographical strategy to map string category labels to integers. For instance, suppose the dataset has a categorical variable named "size" with categories such as "S", "M", "L", "XL". We would like the integer representation to respect the meaning of the sizes by mapping them to increasing integers such as 0, 1, 2, 3. However, the lexicographical strategy used by default would map the labels "S", "M", "L", "XL" to 2, 1, 0, 3, by following the alphabetical order." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /><node ID="DAC5EA15-CA61-439D-9938-D64C593CA231" BACKGROUND_COLOR="#FFFFFF" TEXT="Linear models will be impacted by misordered categories while tree-based models will not be." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
</node>
</node>
<node ID="DD61F1B5-3A96-42C9-9D8C-A4E32A2F90F6" BACKGROUND_COLOR="#FFFFFF" TEXT="Ordinal Data
OneHotEncoder" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="64958778-62D9-49DA-8F6E-F50A7B93C079" BACKGROUND_COLOR="#FFFFFF" TEXT="If a categorical variable does not carry any meaningful order information then ‘Ordinal Encoding’ might be misleading.
>>> OneHotEncoder" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="DCCD6CA9-2352-4574-B8BC-008A98865BB8" BACKGROUND_COLOR="#FFFFFF" TEXT="For a given feature, it will create as many new columns as there are possible categories. For a given sample, the value of the column corresponding to the category will be set to 1 while all the columns of the other categories will be set to 0." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="1371C597-959A-4938-9755-D17EA794AA6F" BACKGROUND_COLOR="#FFFFFF" TEXT=" In general :
OneHotEncoder >>> encoding strategy used when the downstream models are linear models
OrdinalEncoder >>> used with tree-based models" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="BF5CF5FE-2921-4DFB-9626-48C538222D30" BACKGROUND_COLOR="#FFFFFF" TEXT="MODEL VALIDATION" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="0201239D-9130-417C-BE74-36FD90CC62BC" BACKGROUND_COLOR="#FFFFFF" TEXT="Dummy model" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="F95FF4DA-717D-4958-A04E-96070A4025C0" BACKGROUND_COLOR="#FFFFFF" TEXT="A good practice is to compare the testing error with a dummy baseline. It should be also compared to the chance level of the model." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF89D8" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="4A23182F-22A1-44FB-B601-69FBDC82FDAD" BACKGROUND_COLOR="#FFFFFF" TEXT="It will never use any information regarding the data" COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF89D8" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="D8523218-7E0C-4BCD-A818-E44277B0EB70" BACKGROUND_COLOR="#FFFFFF" TEXT="Ex :
- Dummy Model
- Target : revenue class
- Strategy : most_frequent
Accuracy score = 0.77 ‘<=50K’
—> due to the fact that we have 3/4 of the target belonging to ‘<=50K’. Therefore, any predictive model giving results below this dummy classifier will not be helpful." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF89D8" WIDTH="6" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
</node>
</node>
<node ID="AE05D98E-B21D-4297-93AC-B489D3E85F3F" BACKGROUND_COLOR="#FFFFFF" TEXT="CrossValidation" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="B375D51C-76ED-43C6-9535-776B5E10C6BF" BACKGROUND_COLOR="#FFFFFF" TEXT="Issues with splitting the original data
into a training and testing set :
In a setting where the amount of data is small, the subset used to train or test will be small. Besides, a single split does not give information regarding the confidence of the results obtained.
Solution : cross-validation :
It consists of repeating the procedure such that the training and testing sets are different each time. Statistical performance metrics are collected for each repetition and then aggregated. As a result we can get an estimate of the variability of the model's statistical performance." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="2F9F05A1-0D7C-4875-A79E-4719EF37468B" BACKGROUND_COLOR="#FFFFFF" TEXT="The generalization performance measured by CV is on average similar by the measure one would have obtained with a single train-test split.
However the measure obtained by a single train-test split can vary significantly because of the randomness of the choice of a particular train-test split." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="4C39B18C-56DD-4AD6-BE28-52BC46354605" BACKGROUND_COLOR="#FFFFFF" TEXT="The goal of cross-validation is not to train a model, but rather to estimate approximately the generalization performance of a model that would have been trained to the full training set, along with an estimate of the variability (uncertainty on the generalization accuracy)." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
</node>
</node>
<node ID="6DD368FE-8301-4D4F-AC46-F2DB6E1621D8" BACKGROUND_COLOR="#FFFFFF" TEXT="Cross-Validation strategies" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="8595F580-FBE4-42C0-8FBA-D4C590EE46CD" BACKGROUND_COLOR="#FFFFFF" TEXT="Recall : Cross-validation allows estimating the robustness of a predictive model by repeating the splitting procedure." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="A380C19E-3FC5-4FC3-9080-3428855DFFEB" BACKGROUND_COLOR="#FFFFFF" TEXT="K-Fold Strategy" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="A33D10B8-E144-4AEB-9C73-8262BA49FAC7" BACKGROUND_COLOR="#FFFFFF" TEXT="The entire dataset is split into K partitions. The fit/score procedure is repeated K times where at each iteration K - 1 partitions are used to fit the model and 1 partition is used to score. " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
<node ID="7D87716A-DE31-40A1-AC18-2BADCE5CA557" BACKGROUND_COLOR="#FFFFFF" TEXT="Shuffle-split" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="308EEB00-7115-4C03-A540-7BAD90DDFD07" BACKGROUND_COLOR="#FFFFFF" TEXT="At each iteration it :
- Randomly shuffles the order of the samples of a copy of the full dataset
- Split the shuffled dataset into a train and a test set
- Train a new model on the train set;
- Evaluate the testing error on the test set." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="B08DC9D3-89D7-4AE2-A29C-1775334A6E71" BACKGROUND_COLOR="#FFFFFF" TEXT="’n_splits=40’ >>> it trains 40 models in total and all of them will be discarded: it just records their statistical performance on each variant of the test set." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="71E7216B-FFDF-4358-A50B-725B750836CD" BACKGROUND_COLOR="#FFFFFF" TEXT="Stratified K-Fold" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="4D96A97F-8C0D-4FA4-BE84-1A15D2DB39D4" BACKGROUND_COLOR="#FFFFFF" TEXT="Split data by preserving the original class frequencies: it stratifies the data by class. " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="2876E734-990A-4DBA-B7B4-70F9F109B8D4" BACKGROUND_COLOR="#FFFFFF" TEXT="This is a good practice to use stratification within the cross-validation framework when dealing with a classification problem." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="D59D0A1F-DFA5-4680-9FF1-B9EAF01D5AD3" BACKGROUND_COLOR="#FFFFFF" TEXT="Sample Grouping" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="9BB76A5E-F48D-4110-8F18-671FE0082159" BACKGROUND_COLOR="#FFFFFF" TEXT="It is really important to take any sample grouping pattern into account when evaluating a model. Otherwise, the results obtained will be over-optimistic in regards with reality." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="A920CFC8-0073-42B1-A63E-2DFEE8A530F8" BACKGROUND_COLOR="#FFFFFF" TEXT="Ensure that the data associated to a group should either belong to the training or the testing set. Thus, it groups samples together" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="D57FB6FD-41DB-4A78-AED7-C227346F3A34" BACKGROUND_COLOR="#FFFFFF" TEXT="Non i.i.d data" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="288CF000-49CA-4079-9C48-312581AC17D0" BACKGROUND_COLOR="#FFFFFF" TEXT="Non Independent and Identically Distributed" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="998A8513-CFF3-42AC-B52D-DFEB2F061EAB" BACKGROUND_COLOR="#FFFFFF" TEXT="LeaveOneGroupOut" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="8BD6B28A-5A14-4B81-97CF-B208E4337778" BACKGROUND_COLOR="#FFFFFF" TEXT="When a sample depends on past information (usually, time series data).
There is a relationship between a sample at the time t and a sample at t+1." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="5BA50E62-5581-4C9B-A783-9FD6D620126C" BACKGROUND_COLOR="#FFFFFF" TEXT="LeaveOneGroupOut : group the samples into time blocks, and predict each group's information by using information from the other groups. " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
</node>
</node>
<node ID="A19340FE-EDD6-4A5C-A815-A2B34B2A01FA" BACKGROUND_COLOR="#FFFFFF" TEXT="TimeSeriesSplit" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="FF52323B-5DC0-48DF-9D7E-35D4899ACDC4" BACKGROUND_COLOR="#FFFFFF" TEXT="When a model is aimed at forecasting (i.e., predicting future data from past data), training data that are ulterior to the testing data should not be used. " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="5CFFCFB5-8537-4B61-AF9A-6F3C2F952B91" BACKGROUND_COLOR="#FFFFFF" TEXT="Chance level" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="9D2C2944-4306-47A7-A122-7B194203AB3E" BACKGROUND_COLOR="#FFFFFF" TEXT="The chance level can be determined by permuting the labels and check the difference of result. It evaluates the significance of a cross-validated score with permutations." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="B1C56A95-DE44-4BEC-B06F-C8FC6DF8C2C1" BACKGROUND_COLOR="#FFFFFF" TEXT="Fit a model on some training data and evaluate the same model on data where the target vector has been randomized. It will provide the statistical performance of the chance level." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="03670974-3C71-46DE-BDCC-E681CE07F663" BACKGROUND_COLOR="#FFFFFF" TEXT="p-value: fraction of randomized data sets where the estimator performed as well or better than in the original data. " COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="981B2A95-EA8D-4EBD-AC77-760B2F364EB5" BACKGROUND_COLOR="#FFFFFF" TEXT="Small p-value: there is a real dependency between features and targets which has been used by the estimator to give good predictions. " COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
<node ID="72B9081D-7636-46EE-BE6E-DDE8DD8230C4" BACKGROUND_COLOR="#FFFFFF" TEXT="Large p-value: may be due to lack of real dependency between features and targets or the estimator was not able to use the dependency to give good predictions." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
</node>
</node>
</node>
</node>
<node ID="CB5A29C5-140C-412D-8B58-E199B5DBCDFD" BACKGROUND_COLOR="#FFFFFF" TEXT="Nested cross-validation" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="FE1FCEBD-9C70-4ACA-9640-B4613D2A58AD" BACKGROUND_COLOR="#FFFFFF" TEXT="It uses an inner cross-validation to tune the parameters' model and an outer cross-validation to evaluate the model's performance " COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FEFC78" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="0F4EDA4D-F6C6-405D-8B0C-65A555B8BDB0" BACKGROUND_COLOR="#FFFFFF" TEXT="When optimizing parts of the machine learning pipeline, one needs to use nested cross-validation to evaluate the statistical performance of the predictive model. Otherwise, the results obtained without nested cross-validation are over-optimistic." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FEFC78" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="1CCEE336-6DDB-42C2-A4BD-B85E2114F119" BACKGROUND_COLOR="#FFFFFF" TEXT="Recall: the data used for the evaluation should never be used at any point to make a decision on tuning our model." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#FEFC78" WIDTH="6" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
</node>
</node>
</node>
<node ID="B3A26EF6-4634-40B5-A0C1-EE2C66644178" BACKGROUND_COLOR="#FFFFFF" TEXT="OVERFITTING / UNDERFITTING" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="9047A4C0-5EDB-4075-A4F4-142881FB30BC" BACKGROUND_COLOR="#FFFFFF" TEXT="Overfitting / Underfitting" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="AD2FF0E0-0394-4258-916A-032B76AC66C4" BACKGROUND_COLOR="#FFFFFF" TEXT="Underfitting" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="7C6435C0-C3ED-4544-807D-AE83B8684C76" BACKGROUND_COLOR="#FFFFFF" TEXT="- The model is too constrained and thus limited by its expressivity
- The model often makes prediction errors, even on training samples
>>> The training error is large (and so the testing error)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
<node ID="D40EB0B4-20B3-4366-9DBA-BC7D5376A6E9" BACKGROUND_COLOR="#FFFFFF" TEXT="Overfitting" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="551C4231-E3ED-4CBF-8E6C-8543BCA0DFF7" BACKGROUND_COLOR="#FFFFFF" TEXT="- The model is too complex and thus highly flexible
- The model focuses too much on noisy details of the training set
- The prediction function does not reflect the true generative process and induces errors on the test set
>>> Testing error is much bigger than training error" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
</node>
</node>
<node ID="5DA57D24-B162-4A8B-A687-96D045F80220" BACKGROUND_COLOR="#FFFFFF" TEXT="Controlling Over/Underfitting (1/2)" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="FA5D31B6-C82B-4239-AC13-DC00816C2D88" BACKGROUND_COLOR="#FFFFFF" TEXT="- Validation curve : check influence of a hyperparameter
- Learning curve : check influence of the size of the training set" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="540F7917-52ED-4116-BDBC-2FFD4FE35650" BACKGROUND_COLOR="#FFFFFF" TEXT=" Learning curve
If we achieve a plateau and adding new samples in the training set does not reduce the testing error, we might have reach the Bayes error rate using the available model. Using a more complex model might be the only possibility to reduce the testing error further." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
<node ID="0487DF39-A90A-4500-AA7B-BEE14C5DB17F" BACKGROUND_COLOR="#FFFFFF" TEXT=" Validation curve
We have to look at the sweet spot of the curve i.e. the spot where the training error is low enough and where the testing error is close to the training error." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
</node>
</node>
<node ID="EC45D5A4-AB6A-49A2-82E9-B8C648892455" BACKGROUND_COLOR="#FFFFFF" TEXT="Controlling Overfitting (2/2)" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="A92310AF-7DAA-4A10-A2D1-6EFDEDA4EF4B" BACKGROUND_COLOR="#FFFFFF" TEXT="Regularization (penelized weights)" COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="7B8F2E49-1308-48AC-8B25-DC7124F8C44A" BACKGROUND_COLOR="#FFFFFF" TEXT="Ridge regression (L2-regularisation)
It pulls the coefficients towards zero : if the coefficient does not reducing significantly enough the training error, then it will pull it towards zero (more than the others).
It lowers the variance but increase the bias of the model." COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="9C628198-75C5-4D33-8FB1-89482F61E586" BACKGROUND_COLOR="#FFFFFF" TEXT="Always use at leat a Ridge regression instead of a linear regression" COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="E093E133-CAEF-4811-B90D-A942D788B9E5" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="F7057914-385F-4642-B4A6-EC4505666DFC" BACKGROUND_COLOR="#FFFFFF" TEXT="Logistic regression (L2-penality)
For classification problems. It’s already a regularized model with high C value (C=1) leading to weaker regularisation" COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="626E09F4-4B44-49D2-B5B3-BE25663BA02A" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="108A7ED9-C9B7-41BC-9CC2-F9C20E205ADE" BACKGROUND_COLOR="#FFFFFF" TEXT="With a strong regularization, the region of confidence it much larger and many more points are impacting the orientation of the straight lines. It takes a lot of points to change the orientation of the lines contrary to a weaker regularization. " COLOR="#595959" POSITION="right" STYLE="bubble"><edge COLOR="#00F900" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
<node ID="4900A66A-B5F4-4B24-A22B-20033AAD0BE9" BACKGROUND_COLOR="#FFFFFF" TEXT="Variance / Bias" COLOR="#AF50C8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="E84869E7-9338-47F1-A5DF-FA94432729EC" BACKGROUND_COLOR="#FFFFFF" TEXT="Fitting with a high variance model
-Overfitting-" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="724C38ED-9545-46E1-9A1A-22971E41EF81" BACKGROUND_COLOR="#FFFFFF" TEXT="High sensitivity to details of the training set that are not necessarily present in the test data : make prediction errors without obvious structure.
>>> unstable models" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="E790FDE5-67CE-43AB-82F7-CE6DDCC84FA2" BACKGROUND_COLOR="#FFFFFF" TEXT="On average they are good but prediction errors on the test set are therefore very random (not systematic).
>>> bad individual test error" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="9CB7F538-40FC-45E4-A0DF-43D3B15E17E2" BACKGROUND_COLOR="#FFFFFF" TEXT="Fitting with a high biais model
-Underfitting-" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="BC99E3CD-9945-4BCF-97C2-28CB6B700E9B" BACKGROUND_COLOR="#FFFFFF" TEXT="A high biais will lead the learned prediction function to ignore some of the interesting structure of the data, at least in some regions of the feature space. This will cause some level of systematic prediction errors, even on the training set. As a result such a model is underfitting." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="CF3CF0AA-00AB-42B4-95EA-1878805A02C2" BACKGROUND_COLOR="#FFFFFF" TEXT="On average they are not good and they make a systematic kind of error.
The bias can come from the choice of the model family" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /><node ID="6125C6DA-14C3-4227-A876-1E49E801B1FF" BACKGROUND_COLOR="#FFFFFF" TEXT="The model makes errors in a consistent way" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="A860B163-D175-45F6-9EFA-50100ED5FC1E" BACKGROUND_COLOR="#FFFFFF" TEXT="MORE ADVENCED MODELS" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="609B9FE2-7059-4628-83DB-51279CC644DE" BACKGROUND_COLOR="#FFFFFF" TEXT="Tree model" COLOR="#AF4FC8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="4478CD53-B6C4-4340-806B-F7B6F5E40FF2" BACKGROUND_COLOR="#FFFFFF" TEXT="A tree will split the data and send them into different branch based on a rule defined at a node. Once that the rules are found, they are never changed." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="6BF18586-1DB8-45E8-A1E6-5C38E1844426" BACKGROUND_COLOR="#FFFFFF" TEXT="The predicted values at a leaf corresponds to the mean of the training samples at this node (for tree regressor) and the best threshold between two classes (for tree classification)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="6E0F61AF-8DD4-4184-87C8-D4FA4D8FC963" BACKGROUND_COLOR="#FFFFFF" TEXT=" Entropy / Gini score :
Quantify how good a split is and automatically the algorithm will find the best threshold value to maximized the improvement in the entropy after the split.
>>> Computes the evolution of the entropy before and after the split to find the best split." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="D10D4ECF-D39F-4C9D-A621-3EEC310B7BCC" BACKGROUND_COLOR="#FFFFFF" TEXT="Overfitting : deeper is a tree, more it will focus on separating noisy samples. Thus, in case of overfitting, one needs to reduce the depth of the tree or stop the tree to grow." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="66422431-9ADD-4C12-B399-2AE6DD63D053" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="B932F979-D197-4578-A064-26AFB16E1CF1" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="B4F3D781-C8AF-43C0-A3DE-19170084DDC5" BACKGROUND_COLOR="#FFFFFF" TEXT="Increasing the depth of the tree will increase the number of partition and thus the number of constant values that the tree is capable of predicting" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="93928467-1984-4D64-90D2-2E36F93E0956" BACKGROUND_COLOR="#FFFFFF" TEXT="Advantage to use decision tree as a based model :
- non-linear
- fast to train" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /></node>
</node>
</node>
<node ID="770AF920-B88E-4E00-8854-DD724B1D4ED6" BACKGROUND_COLOR="#FFFFFF" TEXT="Assembles of model" COLOR="#AF4FC8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="BFF66F13-9F98-48F3-BC98-803ECC8F69F7" BACKGROUND_COLOR="#FFFFFF" TEXT="More robust and lead to better statistical performance than single tree or linear models. However, they are more complex to interpret since the final decision is taken as a combination of predictors" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="9C7699DA-0146-4333-96A0-A10B2BB5B999" BACKGROUND_COLOR="#FFFFFF" TEXT="Bootstrap Aggregating (Bagging)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#929000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="D9491B90-38B4-4FF7-B05B-640E253B5391" BACKGROUND_COLOR="#FFFFFF" TEXT="Can be used with any kind of model" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#929000" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="D077F311-ADFF-447E-B2F2-546B0CCB5D4D" BACKGROUND_COLOR="#FFFFFF" TEXT="Resampling the dataset several times (random subset of the data with possible duplications) and fitting each subset with the model.
Then, one takes a new data point and make each model voting to determine the class (the average is used in the case of regression problem)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#929000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="C4CD60FA-E58C-4F94-BCA9-E236A79AE88C" BACKGROUND_COLOR="#FFFFFF" TEXT="With bagging, it’s possible to use deeper trees thanks to the voting system (overfitting is cancels out)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#929000" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
<node ID="6AD35816-35FA-4ABD-8192-7EFB6B788BA4" BACKGROUND_COLOR="#FFFFFF" TEXT="Random Forests" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="E2C9F709-0A31-4EE8-9021-63B78902727B" BACKGROUND_COLOR="#FFFFFF" TEXT="Only with decision tree model" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="F79831C0-D3A6-49C3-B728-C528232A7D68" BACKGROUND_COLOR="#FFFFFF" TEXT="It uses the same principle than Bagging but on top of this, there is a randomized choice of the decision tree : at each split, a random subset of features is selected and the best split is considered. The performance of the decision tree is intentionally reduced to decorrelate the prediction errors." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="8F2308FF-5042-4877-8CF3-5D0EEC8B55EB" BACKGROUND_COLOR="#FFFFFF" TEXT="Boosting (like AdaBoosting)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="2A298811-1204-48DF-8E9C-1BB14BF5A943" BACKGROUND_COLOR="#FFFFFF" TEXT="Can be used with any kind of model" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="63645B56-0B2C-4938-98AD-7271B6AFD7F0" BACKGROUND_COLOR="#FFFFFF" TEXT="Fits on the entire dataset and identify the prediction errors and gives to that points large weights. Then fits a second model on this new dataset and so on… Finally the models are aggregating.
>>> mispredicted samples are re-weighted at each step." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="E4A7328C-4BC5-4847-B607-E5A6C8CEC243" BACKGROUND_COLOR="#FFFFFF" TEXT="Uses underfitting models which trying to reduce the previous model error, then aggregate them, leads to a result that is less underfits the data." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="A736EF84-06AE-43F3-82D1-6EC104C8B85E" BACKGROUND_COLOR="#FFFFFF" TEXT="Boosting like AdaBoosting is not the best model and tends to overfit when increasing the number of predictors >>> it’s best to use GradientBoosting model" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="65DAAB42-0BC1-4B7B-A20B-D5961B622BB2" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
<node ID="D69B934D-BB27-4FDB-8D6F-50340F8C0D98" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF5E69" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="6470D507-F1BC-4206-B372-4C129B2F644D" BACKGROUND_COLOR="#FFFFFF" TEXT="Hist/GradientBoosting" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="6B0ACF09-D599-4780-A9F5-A1F039607A2F" BACKGROUND_COLOR="#FFFFFF" TEXT="Based on tree model but not only" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="CA25D342-D6F4-4DCB-913C-6F21636B7381" BACKGROUND_COLOR="#FFFFFF" TEXT="Using a gradient boosting procedure and each based model predicts the negative error of previous models.
It’s more flexible while we can chose the loss function to quantify the errors." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="52C36707-1B5A-4BF6-A5E3-0D8E262321F4" BACKGROUND_COLOR="#FFFFFF" TEXT="Fine for small dataset but too slow for n_samples > 10 000. Instead one has to use HistGradientBoosting which discretized numerical features and is much faster to deals with large samples" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#72C8FF" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
</node>
<node ID="F52B62CA-C54D-42F4-938C-B660585E19A1" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#4A4A4A" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica" SIZE="24" /></node>
<node ID="4AC4FFB8-FDCF-43E8-8FF2-A780CCDFFA19" BACKGROUND_COLOR="#FFFFFF" TEXT="Metrics" COLOR="#4B4B4B" POSITION="right" STYLE="bubble"><edge COLOR="#4B4B4B" WIDTH="4" /><font NAME="Helvetica-Bold" SIZE="24" BOLD="true" /><node ID="CEFB9887-1DFB-4438-A05F-43D411AE98B3" BACKGROUND_COLOR="#FFFFFF" TEXT="Accuracy" COLOR="#AF4FC8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="B977E849-A2F2-4BE4-AD31-136787C4A781" BACKGROUND_COLOR="#FFFFFF" TEXT="Compares the predictions with the true predictions (called ground-truth). A True value means that the value predicted by our classifier is identical to the real value, while a False means that our classifier made a mistake. " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="4EB40028-5707-4C99-9596-D480FCF69467" BACKGROUND_COLOR="#FFFFFF" TEXT="Accuracy: how many times the classifier was right, divide by the number of samples in the set." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="5E0C0485-2325-431D-9FD9-D0079F0E234B" BACKGROUND_COLOR="#FFFFFF" TEXT="Does not take into account the type of error the classifier makes.
—> Accuracy is an aggregate of the errors made by the classifier" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FF9559" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
<node ID="DD17E26F-6A60-4872-9FAD-3BD37643E263" BACKGROUND_COLOR="#FFFFFF" TEXT="Classification" COLOR="#AF4FC8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="9A01DB07-5B6E-4B7A-A569-425F691A3DA3" BACKGROUND_COLOR="#FFFFFF" TEXT="Confusion matrix" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="5D38B3FD-6948-46E6-BE79-B7AAC50E0E5D" BACKGROUND_COLOR="#FFFFFF" TEXT="Knowing independently what the error is for each of the cases" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="3FAB9E7E-37C4-4BA9-B049-295014D3D2BB" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="4D2A1B5B-986F-4337-B1AD-BFCF2E1C08C6" BACKGROUND_COLOR="#FFFFFF" TEXT="In-diagonal numbers: correct predictions
Off-diagonal numbers: misclassifications. " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="520B0316-8F58-4FF1-92E5-5B914FE3C50D" BACKGROUND_COLOR="#FFFFFF" TEXT="Top left: the true positives (TP)
Bottom right: the true negatives (TN)
Top right: the false negatives (FN) —> ex: people who gave blood but were predicted to not have given blood
Bottom left: the false positives (FP) —> ex: people who did not give blood but were predicted to have given blood" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="DFD1716B-F129-4EBE-B165-9E5BD546D45E" BACKGROUND_COLOR="#FFFFFF" TEXT="Precision
TP / (TP + FP)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="CEE4CA04-E2F6-4F00-916F-A9E702A41337" BACKGROUND_COLOR="#FFFFFF" TEXT="« How likely the person actually gave blood when the classifier predicted that they did »" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
<node ID="442130DC-606E-4E47-9DAF-8F9EBC0BEEAB" BACKGROUND_COLOR="#FFFFFF" TEXT="Recall
TP / (TP + FN) " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="91B57594-C505-485C-9159-47DE2A8A98C7" BACKGROUND_COLOR="#FFFFFF" TEXT="« How well the classifier is able to correctly identify people who did give blood »" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
<node ID="3B50DBB6-482C-4E2E-8A21-CC14F3F68B0A" BACKGROUND_COLOR="#FFFFFF" TEXT="Specificity
TN / (TN + FP)" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="7F09E197-4B3A-4C13-B4DC-55C2FD40B0E5" BACKGROUND_COLOR="#FFFFFF" TEXT="Measures the proportion of correctly classified samples in the negative class " COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
</node>
<node ID="A7A9C2A8-F93A-4C36-9DCF-36506F8EA807" BACKGROUND_COLOR="#FFFFFF" TEXT="Precision-recall curve" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="79E6AFDA-1765-4EAC-B03E-A4E715708F29" BACKGROUND_COLOR="#FFFFFF" TEXT="Each point corresponds to a level of probability which we used as a decision threshold. By varying this decision threshold, we get different precision vs. recall values.
The precision and recall metric focuses on the positive class." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="B5B993FA-EB1A-4053-B1C2-FB7513ADDF82" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="E86DE668-543F-4865-ABFF-1AD15C5EFCA0" BACKGROUND_COLOR="#FFFFFF" TEXT="A perfect classifier would have a precision of 1 for all recall values. A metric characterizing the curve is linked to the area under the curve (AUC) and is named average precision (AP). With an ideal classifier, the average precision would be 1." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
<node ID="96E8D4E7-74D3-4388-8F4B-BD3C4C85F8B6" BACKGROUND_COLOR="#FFFFFF" TEXT="Receiver Operating Characteristic curve" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="E6B89BF5-6704-42E2-85BE-53AF87E171AF" BACKGROUND_COLOR="#FFFFFF" TEXT="Highlight the compromise between accurately discriminating the positive class and accurately discriminating the negative classes" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="95319C07-F7E8-441D-AE8D-0846BCC0F033" BACKGROUND_COLOR="#FFFFFF" TEXT="The probability threshold varying for determining "hard" prediction and compute the metrics. It characterize the statistical performance of the classifier" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="252FC7EB-98E2-450B-85C3-1F5E056A7E99" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="AB64255A-204B-4123-A61A-4663BC9964AC" BACKGROUND_COLOR="#FFFFFF" TEXT="The dummy classifier line shows that even the worst statistical performance obtained will be above this line." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#64C8CD" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="B4FC9A8D-57E9-42BE-9BD8-288772D9EE44" BACKGROUND_COLOR="#FFFFFF" TEXT="Regression" COLOR="#AF4FC8" POSITION="right" STYLE="bubble"><edge COLOR="#000000" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="3BCF77F8-3257-4588-A14C-CB1C78772DA4" BACKGROUND_COLOR="#FFFFFF" TEXT="MSE" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="05A09BAF-7BE6-4AD6-B558-A0532169ADEC" BACKGROUND_COLOR="#FFFFFF" TEXT="A basic loss function used in regression. This metric is sometimes used to evaluate the model since it is optimized by said mode : it minimizes the mean squared error on the training set, so there is no other set of coefficients which will decrease the error." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="B8D5A417-EF8A-42AB-AA8B-9453B059339D" BACKGROUND_COLOR="#FFFFFF" TEXT="The raw MSE can be difficult to interpret —> rescale the MSE by the variance of the target. This score is known as the coefficient of determination R2." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
<node ID="A7CF8360-DF76-4DB0-9FF3-C0130FFCEE90" BACKGROUND_COLOR="#FFFFFF" TEXT="R2" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="1F04CC9F-0602-42AF-8DF1-28B90EE1DC8F" BACKGROUND_COLOR="#FFFFFF" TEXT="Represents the proportion of variance of the target that is explained by the independent variables in the model. Also, gives insight into the quality of the model's fit.
Best score 1 but no lower bound.
A model that predicts the expected value of the target would get a score of 0." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="4CC9FE5C-5951-4F6B-8750-F7272F6156B8" BACKGROUND_COLOR="#FFFFFF" TEXT="— Cannot be compared from one dataset to another.
— The value obtained does not have a meaningful interpretation relative to the original unit of the target.
— Use the median or mean absolute error to get an interpretable score." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="95EF4206-BC2F-4F36-9D2E-7265E8F25D86" BACKGROUND_COLOR="#FFFFFF" TEXT="Default score used in scikit-learn by calling the method score" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica-Oblique" SIZE="20" ITALIC="true" /></node>
</node>
</node>
</node>
<node ID="CCA29547-2C88-418C-996A-5E52CEA72A9C" BACKGROUND_COLOR="#FFFFFF" TEXT="Absolute Errors" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica-Bold" SIZE="20" BOLD="true" /><node ID="DCAE1655-5650-48A5-B7E6-9B16CBD04A65" BACKGROUND_COLOR="#FFFFFF" TEXT="Have a meaningful interpretation. But the mean can be impacted by large error.
If we don’t want to have such a big influence on the metric we should use the Median Absolute Error." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="CFFD80D9-5BDF-4444-A636-4D93A9C8C3F1" BACKGROUND_COLOR="#FFFFFF" TEXT="Those 2 metrics are not relative i.e. committing an error of 50k for an house value at 50k has the same impact than committing an error of 50k for an house value at 500k." COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /><node ID="F3700CF9-80CA-48D7-85C6-BEBA294EF263" BACKGROUND_COLOR="#FFFFFF" TEXT="The Mean Absolute Percentage Error introduce this relative scaling" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="6" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</node>
</node>
</node>
</node>
<node ID="8651FADF-3D25-49F6-96E4-B313D2E8E837" BACKGROUND_COLOR="#FFFFFF" TEXT="" COLOR="#6D6D6D" POSITION="right" STYLE="bubble"><edge COLOR="#FFCD3B" WIDTH="4" /><font NAME="Helvetica" SIZE="20" /></node>
</node>
</map>