Merge pull request #48 from Advueu963/Jiawen

Clean up
tloehrlmu · Aug 10, 2024 · c0304b2 · c0304b2
2 parents d348b32 + 00bc892
commit c0304b2
Show file tree

Hide file tree

Showing 6 changed files with 43 additions and 394 deletions.
diff --git a/chapter-pe-calibration/calibration.ipynb b/chapter-pe-calibration/calibration.ipynb
@@ -45,7 +45,7 @@
    "source": [
     "<div align=\"justify\">\n",
     "\n",
-    "The predictions delivered by corresponding methods (see [Scoring](../chapter-pe-scoring/scoring) from the previous chapter) are at best \"pseudo-probabilities\" that are often not very accurate. \n",
+    "The predictions delivered by corresponding methods (see [Probability Estimation via Scoring](../chapter-pe-scoring/scoring) from the previous chapter) are at best \"pseudo-probabilities\" that are often not very accurate. \n",
     "Besides, there are many methods that deliver natural scores, \n",
     "intuitively expressing a degree of confidence \n",
     "(like the distance from the [separating hyperplane in support vector machines](svm)), \n",
@@ -87,7 +87,7 @@
     "\\mathbb{E}[Y\\vert f(X)=s_i] = \\frac{\\sum_{j=1}^n y_j \\cdot I[f(x_j)=s_i]}{\\sum_{j=1}^n I[f(x_j)=s_i]},\n",
     "$$\n",
     "\n",
-    "where $I[\\cdot]$ is the indicator function. \n",
+    "here $I[\\cdot]$ is the indicator function. \n",
     "For any fixed model $f$ there exists a uniquely determined calibration map which produces perfectly calibrated probabilities on the given dataset. \n",
     "That calibration map can be defined as $\\mu(s_i) = \\mathbb{E}[Y\\vert f(X)=s_i]$. \n",
     "However, usually we do not want to learn perfect calibration maps on the training data, \n",
@@ -69525,49 +69525,6 @@
     "</div>"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "result = []\n",
-    "\n",
-    "for i, bins in enumerate(bins_list):\n",
-    "    prob_true_svm, prob_pred_svm = calibration_curve(y_test, y_prob, n_bins=bins)\n",
-    "    prob_true_platt, prob_pred_platt = calibration_curve(y_test, platt_probs, n_bins=bins)\n",
-    "    \n",
-    "    bs_svm = brier_score_loss(y_test, y_prob)\n",
-    "    bs_platt = brier_score_loss(y_test, platt_probs)\n",
-    "\n",
-    "    ll_svm = log_loss(y_test, y_prob)\n",
-    "    ll_platt = log_loss(y_test, platt_probs)\n",
-    "\n",
-    "    ece_svm = calculate_ece(prob_true_svm, prob_pred_svm, n_bins=bins)\n",
-    "    ece_platt = calculate_ece(prob_true_platt, prob_pred_platt, n_bins=bins)\n",
-    "\n",
-    "    mce_svm = calculate_mce(prob_true_svm, prob_pred_svm, n_bins=bins)\n",
-    "    mce_platt = calculate_mce(prob_true_platt, prob_pred_platt, n_bins=bins)\n",
-    "    \n",
-    "    result.append({\n",
-    "        'Bins': bins,\n",
-    "        'Method': 'SVM',\n",
-    "        'Brier Score': bs_svm,\n",
-    "        'Log Loss': ll_svm,\n",
-    "        'ECE': ece_svm,\n",
-    "        'MCE': mce_svm\n",
-    "    })\n",
-    "    \n",
-    "    result.append({\n",
-    "        'Bins': bins,\n",
-    "        'Method': 'SVM-Platt',\n",
-    "        'Brier Score': bs_platt,\n",
-    "        'Log Loss': ll_platt,\n",
-    "        'ECE': ece_platt,\n",
-    "        'MCE': mce_platt\n",
-    "    })"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 10,
@@ -69681,17 +69638,42 @@
     }
    ],
    "source": [
+    "result = []\n",
     "\n",
-    "# df_results = pd.DataFrame({\n",
-    "#     'Method': ['SVM', 'SVM-Platt'],\n",
-    "#     'Brier Score': [bs_svm, bs_platt],\n",
-    "#     'Log Loss': [ll_svm, ll_platt],\n",
-    "#     'ECE': [ece_svm, ece_platt],\n",
-    "#     'MCE': [mce_svm, mce_platt]\n",
-    "# })\n",
+    "for i, bins in enumerate(bins_list):\n",
+    "    prob_true_svm, prob_pred_svm = calibration_curve(y_test, y_prob, n_bins=bins)\n",
+    "    prob_true_platt, prob_pred_platt = calibration_curve(y_test, platt_probs, n_bins=bins)\n",
+    "    \n",
+    "    bs_svm = brier_score_loss(y_test, y_prob)\n",
+    "    bs_platt = brier_score_loss(y_test, platt_probs)\n",
+    "\n",
+    "    ll_svm = log_loss(y_test, y_prob)\n",
+    "    ll_platt = log_loss(y_test, platt_probs)\n",
+    "\n",
+    "    ece_svm = calculate_ece(prob_true_svm, prob_pred_svm, n_bins=bins)\n",
+    "    ece_platt = calculate_ece(prob_true_platt, prob_pred_platt, n_bins=bins)\n",
+    "\n",
+    "    mce_svm = calculate_mce(prob_true_svm, prob_pred_svm, n_bins=bins)\n",
+    "    mce_platt = calculate_mce(prob_true_platt, prob_pred_platt, n_bins=bins)\n",
+    "    \n",
+    "    result.append({\n",
+    "        'Bins': bins,\n",
+    "        'Method': 'SVM',\n",
+    "        'Brier Score': bs_svm,\n",
+    "        'Log Loss': ll_svm,\n",
+    "        'ECE': ece_svm,\n",
+    "        'MCE': mce_svm\n",
+    "    })\n",
+    "    \n",
+    "    result.append({\n",
+    "        'Bins': bins,\n",
+    "        'Method': 'SVM-Platt',\n",
+    "        'Brier Score': bs_platt,\n",
+    "        'Log Loss': ll_platt,\n",
+    "        'ECE': ece_platt,\n",
+    "        'MCE': mce_platt\n",
+    "    })\n",
     "\n",
-    "# df_results[\"SVM\"] = df_results[\"SVM\"].round(4)\n",
-    "# df_results[\"SVM-Platt\"] = df_results[\"SVM-Platt\"].round(4)\n",
     "df_results = pd.DataFrame(result)\n",
     "df_results"
    ]
@@ -150864,7 +150846,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.6"
   }
  },
  "nbformat": 4,

diff --git a/chapter-pe-ensemble/ensemble.ipynb b/chapter-pe-ensemble/ensemble.ipynb
@@ -12382,7 +12382,7 @@
     "As shown in the above figures, \n",
     "we make the following interpretations and hypotheses.\n",
     "\n",
-    "- *Single Tree*: The uncertainty is concentrated around 0.5, indicating the uniform uncertainty for all predictions. This is expected as a single tree lacks ensemble variability.\n",
+    "- *Single Tree Uncertainty*: The uncertainty is concentrated around 0.5, indicating the uniform uncertainty for all predictions. This is expected as a single tree lacks ensemble variability.\n",
     "- *Bagging uncertainty*: The histogram shows a more varied distribution of uncertainty values, \n",
     "with a significant number of predictions having low uncertainty (around 0.0), indicating good overall performance with a balanced approach to uncertainty, \n",
     "even though some predictions have moderate uncertainty, \n",
@@ -12426,7 +12426,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.2"
+   "version": "3.9.6"
   }
  },
  "nbformat": 4,

diff --git a/chapter-prelude/prelude.ipynb b/chapter-prelude/prelude.ipynb
@@ -29,7 +29,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Even though the importance of uncertainty quantification steadily grows with the spread of machine learning, there are still few resources that make the topic accessible to a wider range of people. Most of the resources that can be found online are scientific publications and unlike for machine learning textbooks for a broad audience without scientific background are not available. The aim of this book is to change that by providing a comprehensive overview of the topic in an interactive format. Inspired by the [dive into deep learning](https://d2l.ai/index.html) book we tried to include code examples in all chapters of the book that make it easier to understand abstracts concepts and encourage the reader to play around with the examples. The underlying content is based on a joint work of Eyke Hüllermeeir and Willem Waegeman {cite:p}`DBLP:journals/ml/HullermeierW21`. The project has been developed by students at the Institute of Informatics at LMU Munich."
+    "Even though the importance of uncertainty quantification steadily grows with the spread of machine learning, there are still few resources that make the topic accessible to a wider range of people. Most of the resources that can be found online are scientific publications and unlike for machine learning textbooks for a broad audience without scientific background are not available. The aim of this book is to change that by providing a comprehensive overview of the topic in an interactive format. Inspired by the [dive into deep learning](https://d2l.ai/index.html) book we tried to include code examples in all chapters of the book that make it easier to understand abstracts concepts and encourage the reader to play around with the examples. The underlying content is based on a joint work of Eyke Hüllermeier and Willem Waegeman {cite:p}`DBLP:journals/ml/HullermeierW21`. The project has been developed by students at the Institute of Informatics at LMU Munich."
    ]
   },
   {
@@ -67,14 +67,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Most of the chapters include code examples implemented in Python. The readers are encouraged to change and play around with them. To do this a chapter can be either opened in google colab (rocket in top right corner) or found as a notebook in the [Github repository](https://github.com/Advueu963/PageTest). The needed inputs are listed in the beginning cell of each chapter, all required libraries can be found in the [requirements.txt](https://github.com/Advueu963/PageTest/blob/main/requirements.txt) file."
+    "Most of the chapters include code examples are implemented in Python. The readers are encouraged to change and play around with them. To do this a chapter can be either opened in google colab (rocket in top right corner) or found as a notebook in the [GitHub repository](https://github.com/Advueu963/PageTest). The needed inputs are listed in the beginning cell of each chapter, all required libraries can be found in the [requirements.txt](https://github.com/Advueu963/PageTest/blob/main/requirements.txt) file."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Please note that the code examples have been developed for illustration purposes only. While some of the examples use well established libraries like scikit learn for machine learning, we often found ourselves implementing concepts from scratch because there were no existing libraries to the best of our knowledge. While the authors put a lot of effort into developing these code examples a lot of the code is not thoroughly tested for reliability and efficiency in every scenario and should never be used in production environments without performing additional quality controls first."
+    "Please note that the code examples have been developed for illustration purposes only. While some of the examples use well established libraries like scikit-learn for machine learning, we often found ourselves implementing concepts from scratch because there were no existing libraries to the best of our knowledge. While the authors put a lot of effort into developing these code examples a lot of the code is not thoroughly tested for reliability and efficiency in every scenario and should never be used in production environments without performing additional quality controls first."
    ]
   },
   {
@@ -89,7 +89,7 @@
    "metadata": {},
    "source": [
     "The project was build using [Jupyter Book](https://jupyterbook.org/en/stable/intro.html#) and can be downloaded as PDF or [Jupyter NoteBook](https://jupyter.org). \n",
-    "Additionally the whole project including all individual notebooks can be found on the projects [Github Page](https://github.com/Advueu963/PageTest). Readers are encouraged to leave feedback and comments in the discussion forum that can be found at the bottom of each page."
+    "Additionally the whole project including all individual notebooks can be found on the projects [GitHub Page](https://github.com/Advueu963/PageTest). Readers are encouraged to leave feedback and comments in the discussion forum that can be found at the bottom of each page."
    ]
   },
   {

diff --git a/jupyter_try.ipynb b/jupyter_try.ipynb
diff --git a/markdown-notebooks.md b/markdown-notebooks.md
diff --git a/notebooks.ipynb b/notebooks.ipynb