Merge pull request #80 from CMU-17313Q/Jupyter_notebook

update fairness methods
CMU-17313Q · Nov 12, 2023 · 719f565 · 719f565
2 parents 2b7c7dc + fee94df
commit 719f565
Showing 1 changed file with 138 additions and 66 deletions.
diff --git a/Model Performance & Fairness.ipynb b/Model Performance & Fairness.ipynb
@@ -29,7 +29,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 15,
    "id": "62355f99",
    "metadata": {},
    "outputs": [],
@@ -48,6 +48,7 @@
     "## Load Model & Data\n",
     "model = joblib.load('model.pkl')\n",
     "test_data = pd.read_csv('student_data.csv')\n",
+    "test_data_with_gender = test_data.copy()\n",
     "\n",
     "## Data Preprocessing\n",
     "test_data = test_data.drop(columns=['Student ID', 'Gender'])\n",
@@ -70,7 +71,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 16,
    "id": "dd979c2a",
    "metadata": {},
    "outputs": [
@@ -94,7 +95,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 17,
    "id": "cc15ab62",
    "metadata": {},
    "outputs": [
@@ -130,7 +131,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 18,
    "id": "ddb28bc0",
    "metadata": {},
    "outputs": [
@@ -168,7 +169,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 19,
    "id": "c11d9bc8",
    "metadata": {},
    "outputs": [
@@ -210,7 +211,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 20,
    "id": "97c8641b",
    "metadata": {},
    "outputs": [
@@ -260,64 +261,15 @@
     "\n",
     "We will evaluate our model using several fairness strategies and corresponding metrics:\n",
     "\n",
-    "1. **Disparate Impact Analysis**\n",
-    "2. **Equality of Opportunity**\n",
-    "3. **Predictive Parity**\n",
+    "1. **Equality of Opportunity**\n",
+    "2. **Predictive Parity**\n",
+    "3. **Group Unaware**\n",
+    "4. **Demographic Parity**\n",
+    "5. **Equalized odds**\n",
     "\n",
     "Let's implement these evaluations step-by-step.\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "8576cd73",
-   "metadata": {},
-   "source": [
-    "### Disparate Impact Analysis\n",
-    "For Disparate Impact Analysis, we'll consider the ratio of positive outcomes (Good Candidate) between different genders."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "660c3043",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Selection Rates by Gender:\n",
-      "Gender\n",
-      "F    0.460076\n",
-      "M    0.489451\n",
-      "Name: Good Candidate, dtype: float64\n",
-      "\n",
-      "Disparate Impact (Selection Rate Ratio):\n",
-      "Gender\n",
-      "F    0.939983\n",
-      "M    1.000000\n",
-      "Name: Good Candidate, dtype: float64\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Re-add Gender to calculate the selection rate for each gender\n",
-    "test_data_with_gender = pd.read_csv('student_data.csv')\n",
-    "\n",
-    "# Calculate selection rates\n",
-    "selection_rates = test_data_with_gender.groupby('Gender')['Good Candidate'].mean()\n",
-    "\n",
-    "print(\"Selection Rates by Gender:\")\n",
-    "print(selection_rates)\n",
-    "\n",
-    "# Check for disparate impact\n",
-    "most_selected_rate = selection_rates.max()\n",
-    "disparate_impact = selection_rates / most_selected_rate\n",
-    "\n",
-    "print(\"\\nDisparate Impact (Selection Rate Ratio):\")\n",
-    "print(disparate_impact)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "5c88d791",
@@ -329,7 +281,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 28,
    "id": "fd38b2e8",
    "metadata": {},
    "outputs": [
@@ -340,8 +292,8 @@
       "\n",
       "True Positive Rates by Gender:\n",
       "Gender\n",
-      "F    0.512397\n",
-      "M    0.439655\n",
+      "F    0.735537\n",
+      "M    0.913793\n",
       "Name: Predictions, dtype: float64\n"
      ]
     }
@@ -368,7 +320,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 29,
    "id": "a7ab3da3",
    "metadata": {},
    "outputs": [
@@ -379,8 +331,8 @@
       "\n",
       "Precision by Gender:\n",
       "Gender\n",
-      "F    0.492063\n",
-      "M    0.459459\n",
+      "F    0.978022\n",
+      "M    0.726027\n",
       "dtype: float64\n"
      ]
     }
@@ -395,6 +347,126 @@
     "print(\"\\nPrecision by Gender:\")\n",
     "print(precision_gender)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e711356",
+   "metadata": {},
+   "source": [
+    "### Group Unaware\n",
+    "\n",
+    "This approach assesses whether the model is group unaware with respect to certain features. A group unaware model does not use sensitive attributes (like gender, race) as input features during training. This is crucial to ensure that the model does not perpetuate or amplify biases based on these attributes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "ae9fcae3",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The model is Group Unaware with respect to Gender.\n"
+     ]
+    }
+   ],
+   "source": [
+    "is_gender_used = 'Gender' in X_test.columns\n",
+    "group_unaware_status = \"Group Unaware\" if not is_gender_used else \"Group Aware\"\n",
+    "print(f\"The model is {group_unaware_status} with respect to Gender.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ad30c7e",
+   "metadata": {},
+   "source": [
+    "### Demographic Parity\n",
+    "\n",
+    "Demographic Parity is achieved when the decision outcome is independent of a given sensitive attribute. This means that each group (e.g., different genders) should have an equal probability of being predicted as a positive outcome. We calculate the positive rates for each group and compare them.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "4cceebc5",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Positive Rates by Gender:\n",
+      "Gender\n",
+      "F    0.460076\n",
+      "M    0.489451\n",
+      "Name: Good Candidate, dtype: float64\n",
+      "\n",
+      "Demographic Parity Difference:\n",
+      "0.029375431165872545\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Calculate positive rates by sensitive group (e.g., Gender)\n",
+    "positive_rates = test_data_with_gender.groupby('Gender')['Good Candidate'].mean()\n",
+    "print(\"Positive Rates by Gender:\")\n",
+    "print(positive_rates)\n",
+    "\n",
+    "# Check for Demographic Parity\n",
+    "demographic_parity = positive_rates.max() - positive_rates.min()\n",
+    "print(\"\\nDemographic Parity Difference:\")\n",
+    "print(demographic_parity)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21dcbd6c",
+   "metadata": {},
+   "source": [
+    "### Equalized Odds\n",
+    "\n",
+    "Equalized Odds is a fairness criterion that is satisfied when the model's predictions are conditionally independent of the sensitive attribute, given the true outcome. This means that the model should have equal true positive rates and false positive rates across different groups.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "2288d81d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "True Positive Rates by Gender:\n",
+      "Gender\n",
+      "F    0.735537\n",
+      "M    0.913793\n",
+      "Name: Predictions, dtype: float64\n",
+      "\n",
+      "False Positive Rates by Gender:\n",
+      "Gender\n",
+      "F    0.014085\n",
+      "M    0.330579\n",
+      "Name: Predictions, dtype: float64\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Calculate True Positive Rates by Gender\n",
+    "tpr_gender = test_data_with_gender[test_data_with_gender['Good Candidate'] == 1].groupby('Gender')['Predictions'].mean()\n",
+    "print(\"\\nTrue Positive Rates by Gender:\")\n",
+    "print(tpr_gender)\n",
+    "\n",
+    "# Calculate False Positive Rates by Gender\n",
+    "fpr_gender = test_data_with_gender[test_data_with_gender['Good Candidate'] == 0].groupby('Gender')['Predictions'].mean()\n",
+    "print(\"\\nFalse Positive Rates by Gender:\")\n",
+    "print(fpr_gender)\n"
+   ]
   }
  ],
  "metadata": {