A machine learning model makes predictions of an outcome for a particular instance. (Given an instance of a loan application, predict if the applicant will repay the loan.) The model makes these predictions based on a training dataset, where many other instances (other loan applications) and actual outcomes (whether they repaid) are provided. Thus, a machine learning algorithm will attempt to find patterns, or generalizations, in the training dataset to use when a prediction for a new instance is needed. (For example, one pattern it might discover is "if a person has salary > USD 40K and has outstanding debt < USD 5, they will repay the loan".) In many domains this technique, called supervised machine learning, has worked very well.
However, sometimes the patterns that are found may not be desirable or may even be illegal. For example, a loan repay model may determine that age plays a significant role in the prediction of repayment because the training dataset happened to have better repayment for one age group than for another. This raises two problems: 1) the training dataset may not be representative of the true population of people of all age groups, and 2) even if it is representative, it is illegal to base any decision on a applicant's age, regardless of whether this is a good prediction based on historical data.
AI Fairness 360 is designed to help address this problem with fairness metrics and bias mitigators. Fairness metrics can be used to check for bias in machine learning workflows. Bias mitigators can be used to overcome bias in the workflow to produce a more fair outcome.
When the reader has completed this Code Pattern, they will understand how to:
- Compute a fairness metric on original data using AI Fairness 360
- Mitigate bias by transforming the original dataset
- Compute fairness metric on transformed training dataset
- User interacts with Watson Studio to create a Jupyter notebook
- Notebook imports the AIF360 toolkit.
- Data is loaded into the notebook.
- User runs the notebook, which uses AIF360 tookit to assess fairness of Machine Learning model.
- Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
- Tensorflow: An open source software library for numerical computation using data flow graphs.
- Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
Either run locally:
then:
Clone the repo locally. In a terminal, run:
git clone https://github.com/IBM/ensure-loan-fairness-aif360
-
Start your Jupyter Notebook process. Running in the
ensure-loan-fairness-aif360
cloned repo directory will help you find the notebook and the output as described below. Thejupyter notebook
process will open your browser.cd ensure-loan-fairness-aif360 jupyter notebook
-
Navigate to the
notebooks
directory and open the notebook file namedcredit_scoring.ipynb
.
-
Log into IBM's Watson Studio. Once in, you'll land on the dashboard.
-
Create a new project by clicking
+ New project
and choosingData Science
: -
Enter a name for the project name and click
Create
. -
NOTE: By creating a project in Watson Studio a free tier
Object Storage
service andWatson Machine Learning
service will be created in your IBM Cloud account. Select theFree
storage type to avoid fees. -
Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the
Assets
andSettings
tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.
-
From the new project
Overview
panel, click+ Add to project
on the top right and choose theNotebook
asset type. -
Fill in the following information:
- Select the
From URL
tab. [1] - Enter a
Name
for the notebook and optionally a description. [2] - Under
Notebook URL
provide the following url: https://raw.githubusercontent.com/IBM/ensure-loan-fairness-aif360/master/notebooks/credit_scoring.ipynb [3] - For
Runtime
select thePython 3.5
option. [4]
- Select the
-
Click the
Create
button. -
TIP: Once successfully imported, the notebook should appear in the
Notebooks
section of theAssets
tab.
-
Use the menu pull-down
Cell > Run All
to run the notebook, or run the cells one at a time top-down using the play button. -
As the cells run, watch the output for results or errors. A running cell will have a label like
In [*]
. A completed cell will have a run sequence number instead of the asterisk.
See examples/example_notebook.ipynb
:
- AI Fairness 360 Toolkit (AIF360)
- Live Demo of AI Fairness 360
- Contact AIF360 team on Slack
- IBM launches tools to detect AI fairness, bias and open sources some code
- Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
- Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
- AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
- Watson Studio: Master the art of data science with IBM's Watson Studio
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.