odtlearn.fair_oct

Classes

FairConstrainedOCT

Base class for fair constrained optimal classification trees.

FairSPOCT

An optimal classification tree fit on a given binary-valued data set

FairCSPOCT

An optimal classification tree fit on a given binary-valued data set

FairPEOCT

An optimal classification tree fit on a given binary-valued data set

FairEOppOCT

An optimal classification tree fit on a given binary-valued data set

FairEOddsOCT

An optimal classification tree fit on a given binary-valued data set

FairOCT

An optimal and fair classification tree fitted on a given binary-valued

Module Contents

class odtlearn.fair_oct.FairConstrainedOCT(solver: str, positive_class: int, _lambda: float, obj_mode: str, fairness_bound: float, depth: int, time_limit: int, num_threads: None, verbose: bool)[source]

Bases: odtlearn.constrained_oct.ConstrainedOCT

Base class for fair constrained optimal classification trees.

This class extends the ConstrainedOCT class and provides a framework for implementing fair constrained optimal classification trees. It includes methods for adding fairness constraints, extracting metadata from the input data, and defining the objective function.

Parameters:
solverstr

The name of the solver to use for solving the MIP problem.

positive_classint

The value of the class label which is corresponding to the desired outcome

_lambdafloat

The regularization parameter in the objective. Must be in the interval [0, 1).

obj_mode{‘acc’, ‘balance’, ‘weighted’}, optional (default=’acc’)

The objective mode to use. ‘acc’ for accuracy, ‘balance’ for balanced accuracy, ‘weighted’ for user-defined weights.

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

depthint

The maximum depth of the tree.

time_limitint

The time limit (in seconds) for solving the MIP problem.

num_threadsint, optional

The number of threads to use for solving the MIP problem. If None, all available threads are used.

verbosebool, optional

Whether to display verbose output during the solving process.

Notes

This is a base class and should not be instantiated directly. Instead, use one of the derived classes that implement a specific fairness constraint, such as FairSPOCT, FairCSPOCT, FairPEOCT, FairEOppOCT, or FairEOddsOCT.

The fit method expects the input data X, target labels y, protected features protect_feat, and legitimate factors legit_factor (if applicable) to be provided. The protected features should be binary-valued, and the legitimate factors should be numeric.

The predict method expects the input data X to have the same columns as the data used for fitting the model.

Attributes:
_obj_modestr

The objective mode used for learning the optimal tree. Must be either ‘acc’ or ‘balance’.

_positive_classint

The value of the positive class label.

_fairness_boundfloat

The bound of the fairness constraint. Must be in the interval (0, 1].

_protect_feat_col_labelslist of str

The column labels of the protected features.

_protect_feat_col_dtypeslist of dtype

The data types of the protected feature columns.

Methods

_add_fairness_constraint(p_df, p_prime_df)

Add the fairness constraint to the MIP problem for the given protected groups.

_extract_metadata(X, y, protect_feat)

Extract metadata from the input data.

_define_objective()

Define the objective function for the MIP problem.

fit(X, y, protect_feat, legit_factor)

Fit the fair constrained optimal classification tree on the given data.

predict(X)

Predict the class labels for the given input data using the fitted model.

fit(X: numpy.ndarray, y: numpy.ndarray, protect_feat: numpy.ndarray, legit_factor: numpy.ndarray, weights: None = None) FairCSPOCT | FairSPOCT | FairEOddsOCT | FairEOppOCT | FairPEOCT[source]

Fit the Fair Constrained Optimal Classification Tree (FairConstrainedOCT) model to the given training data.

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples. Each feature should be binary (0 or 1).

yarray-like of shape (n_samples,)

The target values (class labels) for the training samples.

protect_featarray-like of shape (n_samples, n_protected_features)

The protected feature columns (e.g., race, gender). Can have one or more columns. Each protected feature should be binary (0 or 1).

legit_factorarray-like of shape (n_samples,)

The legitimate factor column (e.g., prior number of criminal acts). This should be a numeric column.

weightsarray-like of shape (n_samples,), optional (default=None)

Sample weights. If None, then samples are equally weighted when obj_mode is ‘acc’, or weights are automatically calculated when obj_mode is ‘balance’. Must be provided when obj_mode is ‘weighted’.

Returns:
selfobject

Returns self.

Raises:
ValueError

If X or protect_feat contains non-binary values, or if inputs have inconsistent numbers of samples. Also raised if weights are not provided when obj_mode is ‘weighted’, or if the number of weights doesn’t match the number of samples.

AssertionError

If the fairness bound is not in the range (0, 1].

Notes

This method fits the FairConstrainedOCT model using mixed-integer optimization while considering fairness constraints. It sets up the optimization problem, solves it, and stores the results.

The fairness constraints are applied based on the specific fairness metric defined in the subclass (e.g., Statistical Parity, Conditional Statistical Parity, Predictive Equality, or Equal Opportunity).

The optimization problem aims to maximize accuracy (or balanced accuracy, depending on the obj_mode) while satisfying the fairness constraints within the specified fairness_bound.

The resulting tree structure is stored in the model and can be used for prediction or visualization.

The behavior of this method depends on the obj_mode specified during initialization: - If obj_mode is ‘acc’, equal weights are used (weights parameter is ignored). - If obj_mode is ‘balance’, weights are automatically calculated to balance class importance. - If obj_mode is ‘weighted’, the provided weights are used.

When obj_mode is not ‘weighted’ and weights are provided, a warning is issued and the weights are ignored.

Examples

>>> from odtlearn.fair_oct import FairConstrainedOCT
>>> import numpy as np
>>> X = np.array([[0, 1], [1, 0], [1, 1], [0, 0]])
>>> y = np.array([0, 1, 1, 0])
>>> protect_feat = np.array([[1], [0], [1], [0]])
>>> legit_factor = np.array([0.1, 0.2, 0.3, 0.4])
>>> model = FairConstrainedOCT(solver="cbc", positive_class=1, depth=2, fairness_bound=0.1)
>>> model.fit(X, y, protect_feat, legit_factor)
predict(X: pandas.core.frame.DataFrame | numpy.ndarray) numpy.ndarray[source]

Predict class labels for samples in X using the fitted Fair Constrained Optimal Classification Tree model.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for which to make predictions. Each feature should be binary (0 or 1).

Returns:
y_predndarray of shape (n_samples,)

The predicted class labels for each sample in X.

Raises:
NotFittedError

If the model has not been fitted yet.

ValueError

If X contains non-binary values or has a different number of features than the training data.

Notes

This method uses the fair decision tree learned during the fit process to classify new samples. It traverses the tree for each sample in X, following the branching decisions until reaching a leaf node, and returns the corresponding class prediction.

The predictions made by this method satisfy the fairness constraints that were imposed during the training process. However, note that the fairness guarantees only hold for the distribution of the training data. When applying the model to new data with a different distribution, the fairness properties may not be preserved.

Examples

>>> from odtlearn.fair_oct import FairSPOCT
>>> import numpy as np
>>> X_train = np.array([[0, 0], [1, 1], [1, 0], [0, 1]])
>>> y_train = np.array([0, 1, 1, 0])
>>> protect_feat = np.array([0, 1, 1, 0])
>>> legit_factor = np.array([0, 1, 0, 1])
>>> clf = FairSPOCT(solver="cbc", positive_class=1, depth=2, fairness_bound=0.1)
>>> clf.fit(X_train, y_train, protect_feat, legit_factor)
>>> X_test = np.array([[1, 1], [0, 0]])
>>> y_pred = clf.predict(X_test)
>>> print(y_pred)
[1 0]
class odtlearn.fair_oct.FairSPOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]

Bases: FairConstrainedOCT

An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring statistical parity (SP) between protected groups.

Parameters:
solver: str

A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.

positive_classint

The value of the class label which is corresponding to the desired outcome

depthint, default = 1

A parameter specifying the depth of the tree

time_limitint, default= 60

The given time limit (in seconds) for solving the MIO problem

_lambdafloat, default = 0

The regularization parameter in the objective. _lambda is in the interval [0,1)

obj_mode{‘acc’, ‘balance’, ‘weighted’}, optional (default=’acc’)

The objective mode to use. ‘acc’ for accuracy, ‘balance’ for balanced accuracy, ‘weighted’ for user-defined weights.

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

num_threads: int, default=None

The number of threads the solver should use. If None, it will use all avaiable threads

calc_metric(protect_feat: pandas.core.frame.DataFrame | numpy.ndarray, y: pandas.core.series.Series | numpy.ndarray)[source]

Calculate the statistical parity metric for the given data.

Parameters:
protect_featarray-like of shape (n_samples, n_protected_features)

The protected feature columns (e.g., race, gender). Can have one or more columns.

yarray-like of shape (n_samples,)

The target values or predicted values.

Returns:
sp_dictdict

A dictionary with key (p,t) and value P(Y=t|P=p), where p is a protected level and t is an outcome value.

Notes

This method calculates the statistical parity metric, which measures the difference in prediction rates across different protected groups.

class odtlearn.fair_oct.FairCSPOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]

Bases: FairConstrainedOCT

An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring conditional statistical parity (CSP) between protected groups.

Parameters:
solver: str

A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.

positive_classint

The value of the class label which is corresponding to the desired outcome

depthint, default = 1

A parameter specifying the depth of the tree

time_limitint, default= 60

The given time limit (in seconds) for solving the MIO problem

_lambdafloat, default = 0

The regularization parameter in the objective. _lambda is in the interval [0,1)

obj_mode{‘acc’, ‘balance’, ‘weighted’}, optional (default=’acc’)

The objective mode to use. ‘acc’ for accuracy, ‘balance’ for balanced accuracy, ‘weighted’ for user-defined weights.

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

num_threads: int, default=None

The number of threads the solver should use. If None, it will use all avaiable threads

calc_metric(protect_feat: pandas.core.frame.DataFrame | numpy.ndarray, legit_factor: pandas.core.frame.DataFrame | numpy.ndarray, y: pandas.core.series.Series | numpy.ndarray)[source]

Calculate the conditional statistical parity metric for the given data.

Parameters:
protect_featarray-like of shape (n_samples, n_protected_features)

The protected feature columns (e.g., race, gender). Can have one or more columns.

legit_factorarray-like of shape (n_samples,)

The legitimate factor column (e.g., prior number of criminal acts).

yarray-like of shape (n_samples,)

The target values or predicted values.

Returns:
csp_dictdict

A dictionary with key (p, f, t) and value P(Y=t|P=p, L=f), where p is a protected level, t is an outcome value, and f is the value of the legitimate feature.

Notes

This method calculates the conditional statistical parity metric, which measures the difference in prediction rates across different protected groups, conditioned on the legitimate factor.

class odtlearn.fair_oct.FairPEOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]

Bases: FairConstrainedOCT

An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring predictive equity (PE) between protected groups.

Parameters:
solver: str

A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.

positive_classint

The value of the class label which is corresponding to the desired outcome

depthint, default = 1

A parameter specifying the depth of the tree

time_limitint, default= 60

The given time limit (in seconds) for solving the MIO problem

_lambdafloat, default = 0

The regularization parameter in the objective. _lambda is in the interval [0,1)

obj_mode: str, default=”acc”

The objective should be used to learn an optimal decision tree. The two options are “acc” and “balance”. The accuracy objective attempts to maximize prediction accuracy while the balance objective aims to learn a balanced optimal decision tree to better generalize to our of sample data.

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

num_threads: int, default=None

The number of threads the solver should use. If None, it will use all avaiable threads

calc_metric(protect_feat: pandas.core.frame.DataFrame | numpy.ndarray, y: pandas.core.series.Series | numpy.ndarray, y_pred: pandas.core.series.Series | numpy.ndarray)[source]

Calculate the predictive equality metric for the given data.

Parameters:
protect_featarray-like of shape (n_samples, n_protected_features)

The protected feature columns (e.g., race, gender). Can have one or more columns.

yarray-like of shape (n_samples,)

The true target values.

y_predarray-like of shape (n_samples,)

The predicted values.

Returns:
eq_dictdict

A dictionary with key (p, t, t_pred) and value P(Y_pred=t_pred|P=p, Y=t), where p is a protected level, t is a true outcome value, and t_pred is a predicted outcome value.

Notes

This method calculates the predictive equality metric, which measures the difference in false positive rates across different protected groups.

class odtlearn.fair_oct.FairEOppOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]

Bases: FairConstrainedOCT

An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring equality of opportunity (EOpp) between protected groups.

Parameters:
solver: str

A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.

positive_classint

The value of the class label which is corresponding to the desired outcome

depthint, default = 1

A parameter specifying the depth of the tree

time_limitint, default= 60

The given time limit (in seconds) for solving the MIO problem

_lambdafloat, default = 0

The regularization parameter in the objective. _lambda is in the interval [0,1)

obj_mode: str, default=”acc”

The objective should be used to learn an optimal decision tree. The two options are “acc” and “balance”. The accuracy objective attempts to maximize prediction accuracy while the balance objective aims to learn a balanced optimal decision tree to better generalize to our of sample data.

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

num_threads: int, default=None

The number of threads the solver should use. If None, it will use all avaiable threads

abstract calc_metric()[source]
class odtlearn.fair_oct.FairEOddsOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]

Bases: FairConstrainedOCT

An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring equal oddts (EOdds) between protected groups.

Parameters:
solver: str

A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.

positive_classint

The value of the class label which is corresponding to the desired outcome

depthint, default = 1

A parameter specifying the depth of the tree

time_limitint, default= 60

The given time limit (in seconds) for solving the MIO problem

_lambdafloat, default = 0

The regularization parameter in the objective. _lambda is in the interval [0,1)

obj_mode: str, default=”acc”

The objective should be used to learn an optimal decision tree. The two options are “acc” and “balance”. The accuracy objective attempts to maximize prediction accuracy while the balance objective aims to learn a balanced optimal decision tree to better generalize to our of sample data.

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

num_threads: int, default=None

The number of threads the solver should use. If None, it will use all avaiable threads

class odtlearn.fair_oct.FairOCT(solver, positive_class, _lambda=0, depth=1, obj_mode='acc', fairness_type=None, fairness_bound=1, time_limit=60, num_threads=None, verbose=False)[source]

Bases: odtlearn.flow_oct_ms.FlowOCTMultipleSink

An optimal and fair classification tree fitted on a given binary-valued data set. The fairness criteria enforced in the training step is one of statistical parity (SP), conditional statistical parity (CSP), predictive equality (PE), equal opportunity (EOpp) or equalized odds (EOdds).

Parameters:
solver: str

A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.

positive_classint

The value of the class label which is corresponding to the desired outcome

depthint, default= 1

A parameter specifying the depth of the tree

time_limitint, default= 60

The given time limit (in seconds) for solving the MIO problem

_lambdafloat, default= 0

The regularization parameter in the objective. _lambda is in the interval [0,1)

num_threads: int, default=None

The number of threads the solver should use. If None, it will use all avaiable threads

fairness_type: [None, ‘SP’, ‘CSP’, ‘PE’, ‘EOpp’, ‘EOdds’], default=None

The type of fairness criteria that we want to enforce

fairness_bound: float (0,1], default=1

The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced

fit(X, y, protect_feat, legit_factor)[source]

Fit the FairOCT model to the given training data.

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples. Each feature should be binary (0 or 1).

yarray-like of shape (n_samples,)

The target values (class labels) for the training samples.

protect_featarray-like of shape (n_samples, n_protected_features)

The protected feature columns (e.g., race, gender). Can have one or more columns.

legit_factorarray-like of shape (n_samples,)

The legitimate factor column (e.g., prior number of criminal acts).

Returns:
selfobject

Returns self.

Raises:
ValueError

If X contains non-binary values or if inputs have inconsistent numbers of samples.

Notes

This method fits the FairOCT model using mixed-integer optimization while considering fairness constraints. It sets up the optimization problem, solves it, and stores the results.

predict(X)[source]

Predict class labels for samples in X using the fitted FairOCT model.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for which to make predictions. Each feature should be binary (0 or 1).

Returns:
y_predndarray of shape (n_samples,)

The predicted class labels for each sample in X.

Raises:
NotFittedError

If the model has not been fitted yet.

ValueError

If X contains non-binary values or has a different number of features than the training data.

Notes

This method uses the fair decision tree learned during the fit process to classify new samples. It traverses the tree for each sample in X, following the branching decisions until reaching a leaf node, and returns the corresponding class prediction.

get_SP(protect_feat, y)[source]

This function returns the statistical parity value for any given protected level and outcome value

Parameters:
  • protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns

  • y – array-like, shape (n_samples,) The target values (class labels in classification).

Return sp_dict:

a dictionary with key =(p,t) and value = P(Y=t|P=p)

where p is a protected level and t is an outcome value

get_CSP(protect_feat, legit_factor, y)[source]

This function returns the conditional statistical parity value for any given protected level, legitimate feature value and outcome value

Parameters:
  • protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns

  • legit_fact – array-like, shape (n_samples,) The legitimate factor column(e.g., prior number of criminal acts)

  • y – array-like, shape (n_samples,) The target values (class labels in classification).

Return csp_dict:

a dictionary with key =(p, f, t) and value = P(Y=t|P=p, L=f) where p is a protected level and t is an outcome value and l is the value of the legitimate feature

get_EqOdds(protect_feat, y, y_pred)[source]

This function returns the false positive and true positive rate value for any given protected level, outcome value and prediction value

Parameters:
  • protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns

  • y – array-like, shape (n_samples,) The true target values (class labels in classification).

  • y_pred – array-like, shape (n_samples,) The predicted values (class labels in classification).

Return eq_dict:

a dictionary with key =(p, t, t_pred) and value = P(Y_pred=t_pred|P=p, Y=t)

get_CondEqOdds(protect_feat, legit_factor, y, y_pred)[source]

This function returns the conditional false negative and true positive rate value for any given protected level, outcome value, prediction value and legitimate feature value

Parameters:
  • protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns

  • legit_factor – array-like, shape (n_samples,) The legitimate factor column(e.g., prior number of criminal acts)

  • y – array-like, shape (n_samples,) The true target values (class labels in classification).

  • y_pred – array-like, shape (n_samples,) The predicted values (class labels in classification).

Return ceq_dict:

a dictionary with key =(p, f, t, t_pred) and value = P(Y_pred=t_pred|P=p, Y=t, L=f)

fairness_metric_summary(metric, new_data=None)[source]

Summarize the specified fairness metric for the fitted model.

Parameters:
metricstr

The name of the fairness metric to summarize. Must be one of ‘SP’, ‘CSP’, ‘PE’, or ‘CPE’.

new_dataarray-like of shape (n_samples,), optional

The new predicted data to use for calculating the fairness metric. If None, the predict method is called on the training data to obtain the predicted values. Default is None.

Returns:
None

The method prints the fairness metric summary as a pandas DataFrame.

Raises:
ValueError

If the specified metric is not one of the supported options.

Notes

This method summarizes the specified fairness metric for the fitted model. The supported fairness metrics are: - ‘SP’: Statistical Parity - ‘CSP’: Conditional Statistical Parity - ‘PE’: Predictive Equality - ‘CPE’: Conditional Predictive Equality

The method checks if the model has been fitted and raises an error if not. If new_data is not provided, the predict method is called on the training data to obtain the predicted values.

The fairness metric summary is printed as a pandas DataFrame, showing the metric values for each combination of protected attribute, legitimate factor (if applicable), true label, and predicted label (if applicable), depending on the selected metric.

Examples

>>> model.fit(X_train, y_train, protect_feat_train, legit_factor_train)
>>> model.fairness_metric_summary('SP')
            (p,y)  P(Y=y|P=p)
0     (Male, False)    0.752475
1      (Male, True)    0.247525
2   (Female, False)    0.742574
3    (Female, True)    0.257426