[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from odtlearn.fair_oct import (
FairSPOCT,
FairPEOCT,
)
FairOCT
Example#
First we generate the data for our example. The diagram within the code block visualizes the training data. We have two binary features (X1
,X2
) and two class labels (+1
and -1
). The protected feature is race, and it has two levels (B
and W
). In the visualization of the training data, we see that, for example, there are 7 instances with (X1,X2) = (0,1)
and among these 7 instances, 5 of them are from race W
and 2 of them from race B
. We also show the breakdown
of the instances based on their class label.
[2]:
'''
X2 |
| |
1 5W: 4(-) 1(+) | 2W: 1(-) 1(+)
| 2B: 2(-) | 5B: 3(-) 2(+)
| |
| |
|---------------------|------------------------
| |
0 4W: 3(-) 1(+) | 3W: 1(-) 2(+)
| 1B: 1(+) | 6B: 1(-) 5(+)
| |
|___________0_________|__________1_____________X1
'''
X = np.array([[0,0],[0,0],[0,0],[0,0],[0,0],
[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],
[1,1],[1,1],[1,1],[1,1],[1,1],[1,1],[1,1],
[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1]])
P = np.array([0,0,0,0,1,
0,0,0,1,1,1,1,1,1,
0,0,1,1,1,1,1,
0,0,0,0,0,1,1])
y = np.array([0,0,0,1,1,
0,1,1,0,1,1,1,1,1,
0,1,0,0,0,1,1,
0,0,0,0,1,0,0])
P = P.reshape(-1,1)
l = X[:,1]
Let’s investigate the following scenarios
We evaluate statistical parity (SP) when we don’t enforce any fairness constraint
We evaluate SP when we add SP constraint with fairbound 0.1
We evaluate predictive equality (PE) when we don’t enforce any fairness constraint
We evaluate PE when we add PE constraint with fairbound 0.04
We add a helper function for displaying the results from each scenario
Evaluating Statistical Parity without Fairness Constraint#
[3]:
fcl_wo_SP = FairSPOCT(
solver="gurobi",
positive_class=1,
depth=2,
_lambda=0.01,
time_limit=100,
fairness_bound=1,
num_threads=None,
obj_mode="acc",
verbose=False,
)
fcl_wo_SP.fit(X, y, P, l)
Set parameter Username
Academic license - for non-commercial use only - expires 2024-06-27
Set parameter TimeLimit to value 100
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])
CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1192 rows, 615 columns and 3124 nonzeros
Model fingerprint: 0xe9eb7e11
Variable types: 14 continuous, 601 integer (601 binary)
Coefficient statistics:
Matrix range [7e-02, 1e+00]
Objective range [1e-02, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective 12.8700000
Presolve removed 734 rows and 294 columns
Presolve time: 0.01s
Presolved: 458 rows, 321 columns, 1450 nonzeros
Variable types: 14 continuous, 307 integer (305 binary)
Root relaxation: objective 2.274000e+01, 309 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 22.74000 0 38 12.87000 22.74000 76.7% - 0s
H 0 0 19.7700000 22.74000 15.0% - 0s
0 0 21.76500 0 91 19.77000 21.76500 10.1% - 0s
H 0 0 19.7800000 21.76500 10.0% - 0s
Cutting planes:
Gomory: 12
Clique: 186
MIR: 5
Flow cover: 1
Zero half: 4
RLT: 1
Explored 1 nodes (491 simplex iterations) in 0.03 seconds (0.04 work units)
Thread count was 8 (of 8 available processors)
Solution count 3: 19.78 19.77 12.87
Optimal solution found (tolerance 1.00e-04)
Best objective 1.978000000000e+01, best bound 1.978000000000e+01, gap 0.0000%
[3]:
FairSPOCT(solver=gurobi,depth=2,time_limit=100,num_threads=None,verbose=False)
Next we calculate a summary for the fairness metric and the in-sample accuracy
[4]:
print(
pd.DataFrame(
fcl_wo_SP.calc_metric(P, fcl_wo_SP.predict(X)).items(),
columns=["(p,y)", "P(Y=y|P=p)"],
)
)
(p,y) P(Y=y|P=p)
0 (0, 0) 0.785714
1 (1, 0) 0.571429
2 (0, 1) 0.214286
3 (1, 1) 0.428571
[5]:
print(
"The in-sample accuracy is {}".format(
np.sum(fcl_wo_SP.predict(X) == y) / y.shape[0]
)
)
The in-sample accuracy is 0.7142857142857143
[6]:
fig, ax = plt.subplots(figsize=(10, 5))
fcl_wo_SP.plot_tree()
plt.show()
Evaluating Statistical Parity with Fairbound=0.1#
[7]:
fcl_w_SP = FairSPOCT(
solver="gurobi",
positive_class=1,
depth=2,
_lambda=0.01,
time_limit=100,
fairness_bound=0.1,
num_threads=None,
obj_mode="acc",
verbose=False,
)
fcl_w_SP.fit(X, y, P, l)
Set parameter Username
Academic license - for non-commercial use only - expires 2024-06-27
Set parameter TimeLimit to value 100
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])
CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1192 rows, 615 columns and 3124 nonzeros
Model fingerprint: 0x0654e912
Variable types: 14 continuous, 601 integer (601 binary)
Coefficient statistics:
Matrix range [7e-02, 1e+00]
Objective range [1e-02, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e-01, 1e+00]
Found heuristic solution: objective 12.8700000
Presolve removed 734 rows and 294 columns
Presolve time: 0.01s
Presolved: 458 rows, 321 columns, 1450 nonzeros
Variable types: 14 continuous, 307 integer (305 binary)
Root relaxation: objective 2.241667e+01, 401 iterations, 0.00 seconds (0.01 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 22.41667 0 77 12.87000 22.41667 74.2% - 0s
H 0 0 14.8300000 22.41667 51.2% - 0s
H 0 0 18.7800000 19.78500 5.35% - 0s
0 0 19.78500 0 60 18.78000 19.78500 5.35% - 0s
0 0 19.77500 0 62 18.78000 19.77500 5.30% - 0s
H 0 0 18.7900000 19.77500 5.24% - 0s
H 0 1 18.8000000 19.77500 5.19% - 0s
Cutting planes:
Gomory: 4
Clique: 120
RLT: 1
Explored 1 nodes (700 simplex iterations) in 0.04 seconds (0.04 work units)
Thread count was 8 (of 8 available processors)
Solution count 5: 18.8 18.79 18.78 ... 12.87
Optimal solution found (tolerance 1.00e-04)
Best objective 1.880000000000e+01, best bound 1.880000000000e+01, gap 0.0000%
[7]:
FairSPOCT(solver=gurobi,depth=2,time_limit=100,num_threads=None,verbose=False)
[8]:
print(
pd.DataFrame(
fcl_w_SP.calc_metric(P, fcl_w_SP.predict(X)).items(),
columns=["(p,y)", "P(Y=y|P=p)"],
)
)
(p,y) P(Y=y|P=p)
0 (0, 1) 0.5
1 (1, 1) 0.5
2 (0, 0) 0.5
3 (1, 0) 0.5
[9]:
print(
"The in-sample accuracy is {}".format(np.sum(fcl_w_SP.predict(X) == y) / y.shape[0])
)
The in-sample accuracy is 0.6785714285714286
[10]:
fig, ax = plt.subplots(figsize=(10, 5))
fcl_w_SP.plot_tree()
plt.show()
Evaluating PE Without Fairness Constraint#
[11]:
fcl_wo_PE = FairPEOCT(
solver="gurobi",
positive_class=1,
depth=2,
_lambda=0.01,
time_limit=100,
fairness_bound=1,
num_threads=None,
obj_mode="acc",
verbose=False,
)
fcl_wo_PE.fit(X, y, P, l)
Set parameter Username
Academic license - for non-commercial use only - expires 2024-06-27
Set parameter TimeLimit to value 100
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])
CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1192 rows, 615 columns and 2942 nonzeros
Model fingerprint: 0x6c429b62
Variable types: 14 continuous, 601 integer (601 binary)
Coefficient statistics:
Matrix range [1e-01, 1e+00]
Objective range [1e-02, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective 12.8700000
Presolve removed 1192 rows and 615 columns
Presolve time: 0.01s
Presolve: All rows and columns removed
Explored 0 nodes (0 simplex iterations) in 0.01 seconds (0.01 work units)
Thread count was 1 (of 8 available processors)
Solution count 2: 19.78 12.87
Optimal solution found (tolerance 1.00e-04)
Best objective 1.978000000000e+01, best bound 1.978000000000e+01, gap 0.0000%
[11]:
FairPEOCT(solver=gurobi,depth=2,time_limit=100,num_threads=None,verbose=False)
[12]:
print(
pd.DataFrame(
fcl_wo_PE.calc_metric(P, y, fcl_wo_PE.predict(X)).items(),
columns=["(p, y, y_pred)", "P(Y_pred=y_pred|P=p, Y=y)"],
)
)
(p, y, y_pred) P(Y_pred=y_pred|P=p, Y=y)
0 (0, 0, 0) 0.888889
1 (1, 0, 0) 0.833333
2 (0, 0, 1) 0.111111
3 (1, 0, 1) 0.166667
4 (0, 1, 0) 0.600000
5 (1, 1, 0) 0.375000
6 (0, 1, 1) 0.400000
7 (1, 1, 1) 0.625000
[13]:
print(
"The in-sample accuracy is {}".format(
np.sum(fcl_wo_PE.predict(X) == y) / y.shape[0]
)
)
The in-sample accuracy is 0.7142857142857143
[14]:
fig, ax = plt.subplots(figsize=(10, 5))
fcl_wo_PE.plot_tree()
plt.show()
Evaluating PE with Fairbound=0.04#
[15]:
fcl_w_PE = FairPEOCT(
solver="gurobi",
positive_class=1,
depth=2,
_lambda=0.01,
time_limit=100,
fairness_bound=0.04,
num_threads=None,
obj_mode="acc",
verbose=False,
)
fcl_w_PE.fit(X, y, P, l)
Set parameter Username
Academic license - for non-commercial use only - expires 2024-06-27
Set parameter TimeLimit to value 100
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])
CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1192 rows, 615 columns and 2942 nonzeros
Model fingerprint: 0x244097c5
Variable types: 14 continuous, 601 integer (601 binary)
Coefficient statistics:
Matrix range [1e-01, 1e+00]
Objective range [1e-02, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [4e-02, 1e+00]
Found heuristic solution: objective 12.8700000
Presolve removed 1181 rows and 593 columns
Presolve time: 0.01s
Presolved: 11 rows, 22 columns, 61 nonzeros
Found heuristic solution: objective 14.8500000
Variable types: 0 continuous, 22 integer (22 binary)
Root relaxation: objective 1.961667e+01, 12 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 19.61667 0 3 14.85000 19.61667 32.1% - 0s
0 0 19.61500 0 1 14.85000 19.61500 32.1% - 0s
0 0 cutoff 0 14.85000 14.85000 0.00% - 0s
Cutting planes:
Gomory: 1
Cover: 1
GUB cover: 1
Explored 1 nodes (17 simplex iterations) in 0.01 seconds (0.01 work units)
Thread count was 8 (of 8 available processors)
Solution count 2: 14.85 12.87
Optimal solution found (tolerance 1.00e-04)
Best objective 1.485000000000e+01, best bound 1.485000000000e+01, gap 0.0000%
[15]:
FairPEOCT(solver=gurobi,depth=2,time_limit=100,num_threads=None,verbose=False)
[16]:
print(
pd.DataFrame(
fcl_w_PE.calc_metric(P, y, fcl_w_PE.predict(X)).items(),
columns=["(p, y, y_pred)", "P(Y_pred=y_pred|P=p, Y=y)"],
)
)
(p, y, y_pred) P(Y_pred=y_pred|P=p, Y=y)
0 (0, 0, 0) 1.0
1 (1, 0, 0) 1.0
2 (0, 0, 1) 0.0
3 (1, 0, 1) 0.0
4 (0, 1, 0) 1.0
5 (1, 1, 0) 1.0
6 (0, 1, 1) 0.0
7 (1, 1, 1) 0.0
[17]:
print(
"The in-sample accuracy is {}".format(np.sum(fcl_w_PE.predict(X) == y) / y.shape[0])
)
The in-sample accuracy is 0.5357142857142857
[18]:
fig, ax = plt.subplots(figsize=(2.5, 1.25))
fcl_w_PE.plot_tree()
plt.show()