Skip to content

Commit

Permalink
Merge pull request #80 from CMU-17313Q/Jupyter_notebook
Browse files Browse the repository at this point in the history
update fairness methods
  • Loading branch information
Reyali619 authored Nov 12, 2023
2 parents 2b7c7dc + fee94df commit 719f565
Showing 1 changed file with 138 additions and 66 deletions.
204 changes: 138 additions & 66 deletions Model Performance & Fairness.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 15,
"id": "62355f99",
"metadata": {},
"outputs": [],
Expand All @@ -48,6 +48,7 @@
"## Load Model & Data\n",
"model = joblib.load('model.pkl')\n",
"test_data = pd.read_csv('student_data.csv')\n",
"test_data_with_gender = test_data.copy()\n",
"\n",
"## Data Preprocessing\n",
"test_data = test_data.drop(columns=['Student ID', 'Gender'])\n",
Expand All @@ -70,7 +71,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 16,
"id": "dd979c2a",
"metadata": {},
"outputs": [
Expand All @@ -94,7 +95,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 17,
"id": "cc15ab62",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -130,7 +131,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 18,
"id": "ddb28bc0",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -168,7 +169,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 19,
"id": "c11d9bc8",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -210,7 +211,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 20,
"id": "97c8641b",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -260,64 +261,15 @@
"\n",
"We will evaluate our model using several fairness strategies and corresponding metrics:\n",
"\n",
"1. **Disparate Impact Analysis**\n",
"2. **Equality of Opportunity**\n",
"3. **Predictive Parity**\n",
"1. **Equality of Opportunity**\n",
"2. **Predictive Parity**\n",
"3. **Group Unaware**\n",
"4. **Demographic Parity**\n",
"5. **Equalized odds**\n",
"\n",
"Let's implement these evaluations step-by-step.\n"
]
},
{
"cell_type": "markdown",
"id": "8576cd73",
"metadata": {},
"source": [
"### Disparate Impact Analysis\n",
"For Disparate Impact Analysis, we'll consider the ratio of positive outcomes (Good Candidate) between different genders."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "660c3043",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selection Rates by Gender:\n",
"Gender\n",
"F 0.460076\n",
"M 0.489451\n",
"Name: Good Candidate, dtype: float64\n",
"\n",
"Disparate Impact (Selection Rate Ratio):\n",
"Gender\n",
"F 0.939983\n",
"M 1.000000\n",
"Name: Good Candidate, dtype: float64\n"
]
}
],
"source": [
"# Re-add Gender to calculate the selection rate for each gender\n",
"test_data_with_gender = pd.read_csv('student_data.csv')\n",
"\n",
"# Calculate selection rates\n",
"selection_rates = test_data_with_gender.groupby('Gender')['Good Candidate'].mean()\n",
"\n",
"print(\"Selection Rates by Gender:\")\n",
"print(selection_rates)\n",
"\n",
"# Check for disparate impact\n",
"most_selected_rate = selection_rates.max()\n",
"disparate_impact = selection_rates / most_selected_rate\n",
"\n",
"print(\"\\nDisparate Impact (Selection Rate Ratio):\")\n",
"print(disparate_impact)"
]
},
{
"cell_type": "markdown",
"id": "5c88d791",
Expand All @@ -329,7 +281,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 28,
"id": "fd38b2e8",
"metadata": {},
"outputs": [
Expand All @@ -340,8 +292,8 @@
"\n",
"True Positive Rates by Gender:\n",
"Gender\n",
"F 0.512397\n",
"M 0.439655\n",
"F 0.735537\n",
"M 0.913793\n",
"Name: Predictions, dtype: float64\n"
]
}
Expand All @@ -368,7 +320,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 29,
"id": "a7ab3da3",
"metadata": {},
"outputs": [
Expand All @@ -379,8 +331,8 @@
"\n",
"Precision by Gender:\n",
"Gender\n",
"F 0.492063\n",
"M 0.459459\n",
"F 0.978022\n",
"M 0.726027\n",
"dtype: float64\n"
]
}
Expand All @@ -395,6 +347,126 @@
"print(\"\\nPrecision by Gender:\")\n",
"print(precision_gender)"
]
},
{
"cell_type": "markdown",
"id": "2e711356",
"metadata": {},
"source": [
"### Group Unaware\n",
"\n",
"This approach assesses whether the model is group unaware with respect to certain features. A group unaware model does not use sensitive attributes (like gender, race) as input features during training. This is crucial to ensure that the model does not perpetuate or amplify biases based on these attributes.\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "ae9fcae3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The model is Group Unaware with respect to Gender.\n"
]
}
],
"source": [
"is_gender_used = 'Gender' in X_test.columns\n",
"group_unaware_status = \"Group Unaware\" if not is_gender_used else \"Group Aware\"\n",
"print(f\"The model is {group_unaware_status} with respect to Gender.\")"
]
},
{
"cell_type": "markdown",
"id": "7ad30c7e",
"metadata": {},
"source": [
"### Demographic Parity\n",
"\n",
"Demographic Parity is achieved when the decision outcome is independent of a given sensitive attribute. This means that each group (e.g., different genders) should have an equal probability of being predicted as a positive outcome. We calculate the positive rates for each group and compare them.\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "4cceebc5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Positive Rates by Gender:\n",
"Gender\n",
"F 0.460076\n",
"M 0.489451\n",
"Name: Good Candidate, dtype: float64\n",
"\n",
"Demographic Parity Difference:\n",
"0.029375431165872545\n"
]
}
],
"source": [
"# Calculate positive rates by sensitive group (e.g., Gender)\n",
"positive_rates = test_data_with_gender.groupby('Gender')['Good Candidate'].mean()\n",
"print(\"Positive Rates by Gender:\")\n",
"print(positive_rates)\n",
"\n",
"# Check for Demographic Parity\n",
"demographic_parity = positive_rates.max() - positive_rates.min()\n",
"print(\"\\nDemographic Parity Difference:\")\n",
"print(demographic_parity)\n"
]
},
{
"cell_type": "markdown",
"id": "21dcbd6c",
"metadata": {},
"source": [
"### Equalized Odds\n",
"\n",
"Equalized Odds is a fairness criterion that is satisfied when the model's predictions are conditionally independent of the sensitive attribute, given the true outcome. This means that the model should have equal true positive rates and false positive rates across different groups.\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "2288d81d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"True Positive Rates by Gender:\n",
"Gender\n",
"F 0.735537\n",
"M 0.913793\n",
"Name: Predictions, dtype: float64\n",
"\n",
"False Positive Rates by Gender:\n",
"Gender\n",
"F 0.014085\n",
"M 0.330579\n",
"Name: Predictions, dtype: float64\n"
]
}
],
"source": [
"# Calculate True Positive Rates by Gender\n",
"tpr_gender = test_data_with_gender[test_data_with_gender['Good Candidate'] == 1].groupby('Gender')['Predictions'].mean()\n",
"print(\"\\nTrue Positive Rates by Gender:\")\n",
"print(tpr_gender)\n",
"\n",
"# Calculate False Positive Rates by Gender\n",
"fpr_gender = test_data_with_gender[test_data_with_gender['Good Candidate'] == 0].groupby('Gender')['Predictions'].mean()\n",
"print(\"\\nFalse Positive Rates by Gender:\")\n",
"print(fpr_gender)\n"
]
}
],
"metadata": {
Expand Down

0 comments on commit 719f565

Please sign in to comment.