odtlearn.fair_oct¶
Classes¶
Base class for fair constrained optimal classification trees. |
|
An optimal classification tree fit on a given binary-valued data set |
|
An optimal classification tree fit on a given binary-valued data set |
|
An optimal classification tree fit on a given binary-valued data set |
|
An optimal classification tree fit on a given binary-valued data set |
|
An optimal classification tree fit on a given binary-valued data set |
|
An optimal and fair classification tree fitted on a given binary-valued |
Module Contents¶
- class odtlearn.fair_oct.FairConstrainedOCT(solver: str, positive_class: int, _lambda: float, obj_mode: str, fairness_bound: float, depth: int, time_limit: int, num_threads: None, verbose: bool)[source]¶
Bases:
odtlearn.constrained_oct.ConstrainedOCT
Base class for fair constrained optimal classification trees.
This class extends the
ConstrainedOCT
class and provides a framework for implementing fair constrained optimal classification trees. It includes methods for adding fairness constraints, extracting metadata from the input data, and defining the objective function.- Parameters:
- solverstr
The name of the solver to use for solving the MIP problem.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- _lambdafloat
The regularization parameter in the objective. Must be in the interval [0, 1).
- obj_mode{‘acc’, ‘balance’, ‘weighted’}, optional (default=’acc’)
The objective mode to use. ‘acc’ for accuracy, ‘balance’ for balanced accuracy, ‘weighted’ for user-defined weights.
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- depthint
The maximum depth of the tree.
- time_limitint
The time limit (in seconds) for solving the MIP problem.
- num_threadsint, optional
The number of threads to use for solving the MIP problem. If None, all available threads are used.
- verbosebool, optional
Whether to display verbose output during the solving process.
Notes
This is a base class and should not be instantiated directly. Instead, use one of the derived classes that implement a specific fairness constraint, such as
FairSPOCT
,FairCSPOCT
,FairPEOCT
,FairEOppOCT
, orFairEOddsOCT
.The
fit
method expects the input data X, target labels y, protected features protect_feat, and legitimate factors legit_factor (if applicable) to be provided. The protected features should be binary-valued, and the legitimate factors should be numeric.The
predict
method expects the input data X to have the same columns as the data used for fitting the model.- Attributes:
- _obj_modestr
The objective mode used for learning the optimal tree. Must be either ‘acc’ or ‘balance’.
- _positive_classint
The value of the positive class label.
- _fairness_boundfloat
The bound of the fairness constraint. Must be in the interval (0, 1].
- _protect_feat_col_labelslist of str
The column labels of the protected features.
- _protect_feat_col_dtypeslist of dtype
The data types of the protected feature columns.
Methods
_add_fairness_constraint(p_df, p_prime_df)
Add the fairness constraint to the MIP problem for the given protected groups.
_extract_metadata(X, y, protect_feat)
Extract metadata from the input data.
_define_objective()
Define the objective function for the MIP problem.
fit(X, y, protect_feat, legit_factor)
Fit the fair constrained optimal classification tree on the given data.
predict(X)
Predict the class labels for the given input data using the fitted model.
- fit(X: numpy.ndarray, y: numpy.ndarray, protect_feat: numpy.ndarray, legit_factor: numpy.ndarray, weights: None = None) FairCSPOCT | FairSPOCT | FairEOddsOCT | FairEOppOCT | FairPEOCT [source]¶
Fit the Fair Constrained Optimal Classification Tree (FairConstrainedOCT) model to the given training data.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
The training input samples. Each feature should be binary (0 or 1).
- yarray-like of shape (n_samples,)
The target values (class labels) for the training samples.
- protect_featarray-like of shape (n_samples, n_protected_features)
The protected feature columns (e.g., race, gender). Can have one or more columns. Each protected feature should be binary (0 or 1).
- legit_factorarray-like of shape (n_samples,)
The legitimate factor column (e.g., prior number of criminal acts). This should be a numeric column.
- weightsarray-like of shape (n_samples,), optional (default=None)
Sample weights. If None, then samples are equally weighted when obj_mode is ‘acc’, or weights are automatically calculated when obj_mode is ‘balance’. Must be provided when obj_mode is ‘weighted’.
- Returns:
- selfobject
Returns self.
- Raises:
- ValueError
If X or protect_feat contains non-binary values, or if inputs have inconsistent numbers of samples. Also raised if weights are not provided when obj_mode is ‘weighted’, or if the number of weights doesn’t match the number of samples.
- AssertionError
If the fairness bound is not in the range (0, 1].
Notes
This method fits the FairConstrainedOCT model using mixed-integer optimization while considering fairness constraints. It sets up the optimization problem, solves it, and stores the results.
The fairness constraints are applied based on the specific fairness metric defined in the subclass (e.g., Statistical Parity, Conditional Statistical Parity, Predictive Equality, or Equal Opportunity).
The optimization problem aims to maximize accuracy (or balanced accuracy, depending on the obj_mode) while satisfying the fairness constraints within the specified fairness_bound.
The resulting tree structure is stored in the model and can be used for prediction or visualization.
The behavior of this method depends on the obj_mode specified during initialization: - If obj_mode is ‘acc’, equal weights are used (weights parameter is ignored). - If obj_mode is ‘balance’, weights are automatically calculated to balance class importance. - If obj_mode is ‘weighted’, the provided weights are used.
When obj_mode is not ‘weighted’ and weights are provided, a warning is issued and the weights are ignored.
Examples
>>> from odtlearn.fair_oct import FairConstrainedOCT >>> import numpy as np >>> X = np.array([[0, 1], [1, 0], [1, 1], [0, 0]]) >>> y = np.array([0, 1, 1, 0]) >>> protect_feat = np.array([[1], [0], [1], [0]]) >>> legit_factor = np.array([0.1, 0.2, 0.3, 0.4]) >>> model = FairConstrainedOCT(solver="cbc", positive_class=1, depth=2, fairness_bound=0.1) >>> model.fit(X, y, protect_feat, legit_factor)
- predict(X: pandas.core.frame.DataFrame | numpy.ndarray) numpy.ndarray [source]¶
Predict class labels for samples in X using the fitted Fair Constrained Optimal Classification Tree model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
The input samples for which to make predictions. Each feature should be binary (0 or 1).
- Returns:
- y_predndarray of shape (n_samples,)
The predicted class labels for each sample in X.
- Raises:
- NotFittedError
If the model has not been fitted yet.
- ValueError
If X contains non-binary values or has a different number of features than the training data.
Notes
This method uses the fair decision tree learned during the fit process to classify new samples. It traverses the tree for each sample in X, following the branching decisions until reaching a leaf node, and returns the corresponding class prediction.
The predictions made by this method satisfy the fairness constraints that were imposed during the training process. However, note that the fairness guarantees only hold for the distribution of the training data. When applying the model to new data with a different distribution, the fairness properties may not be preserved.
Examples
>>> from odtlearn.fair_oct import FairSPOCT >>> import numpy as np >>> X_train = np.array([[0, 0], [1, 1], [1, 0], [0, 1]]) >>> y_train = np.array([0, 1, 1, 0]) >>> protect_feat = np.array([0, 1, 1, 0]) >>> legit_factor = np.array([0, 1, 0, 1]) >>> clf = FairSPOCT(solver="cbc", positive_class=1, depth=2, fairness_bound=0.1) >>> clf.fit(X_train, y_train, protect_feat, legit_factor) >>> X_test = np.array([[1, 1], [0, 0]]) >>> y_pred = clf.predict(X_test) >>> print(y_pred) [1 0]
- class odtlearn.fair_oct.FairSPOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]¶
Bases:
FairConstrainedOCT
An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring statistical parity (SP) between protected groups.
- Parameters:
- solver: str
A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- depthint, default = 1
A parameter specifying the depth of the tree
- time_limitint, default= 60
The given time limit (in seconds) for solving the MIO problem
- _lambdafloat, default = 0
The regularization parameter in the objective. _lambda is in the interval [0,1)
- obj_mode{‘acc’, ‘balance’, ‘weighted’}, optional (default=’acc’)
The objective mode to use. ‘acc’ for accuracy, ‘balance’ for balanced accuracy, ‘weighted’ for user-defined weights.
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- num_threads: int, default=None
The number of threads the solver should use. If None, it will use all avaiable threads
- calc_metric(protect_feat: pandas.core.frame.DataFrame | numpy.ndarray, y: pandas.core.series.Series | numpy.ndarray)[source]¶
Calculate the statistical parity metric for the given data.
- Parameters:
- protect_featarray-like of shape (n_samples, n_protected_features)
The protected feature columns (e.g., race, gender). Can have one or more columns.
- yarray-like of shape (n_samples,)
The target values or predicted values.
- Returns:
- sp_dictdict
A dictionary with key (p,t) and value P(Y=t|P=p), where p is a protected level and t is an outcome value.
Notes
This method calculates the statistical parity metric, which measures the difference in prediction rates across different protected groups.
- class odtlearn.fair_oct.FairCSPOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]¶
Bases:
FairConstrainedOCT
An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring conditional statistical parity (CSP) between protected groups.
- Parameters:
- solver: str
A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- depthint, default = 1
A parameter specifying the depth of the tree
- time_limitint, default= 60
The given time limit (in seconds) for solving the MIO problem
- _lambdafloat, default = 0
The regularization parameter in the objective. _lambda is in the interval [0,1)
- obj_mode{‘acc’, ‘balance’, ‘weighted’}, optional (default=’acc’)
The objective mode to use. ‘acc’ for accuracy, ‘balance’ for balanced accuracy, ‘weighted’ for user-defined weights.
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- num_threads: int, default=None
The number of threads the solver should use. If None, it will use all avaiable threads
- calc_metric(protect_feat: pandas.core.frame.DataFrame | numpy.ndarray, legit_factor: pandas.core.frame.DataFrame | numpy.ndarray, y: pandas.core.series.Series | numpy.ndarray)[source]¶
Calculate the conditional statistical parity metric for the given data.
- Parameters:
- protect_featarray-like of shape (n_samples, n_protected_features)
The protected feature columns (e.g., race, gender). Can have one or more columns.
- legit_factorarray-like of shape (n_samples,)
The legitimate factor column (e.g., prior number of criminal acts).
- yarray-like of shape (n_samples,)
The target values or predicted values.
- Returns:
- csp_dictdict
A dictionary with key (p, f, t) and value P(Y=t|P=p, L=f), where p is a protected level, t is an outcome value, and f is the value of the legitimate feature.
Notes
This method calculates the conditional statistical parity metric, which measures the difference in prediction rates across different protected groups, conditioned on the legitimate factor.
- class odtlearn.fair_oct.FairPEOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]¶
Bases:
FairConstrainedOCT
An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring predictive equity (PE) between protected groups.
- Parameters:
- solver: str
A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- depthint, default = 1
A parameter specifying the depth of the tree
- time_limitint, default= 60
The given time limit (in seconds) for solving the MIO problem
- _lambdafloat, default = 0
The regularization parameter in the objective. _lambda is in the interval [0,1)
- obj_mode: str, default=”acc”
The objective should be used to learn an optimal decision tree. The two options are “acc” and “balance”. The accuracy objective attempts to maximize prediction accuracy while the balance objective aims to learn a balanced optimal decision tree to better generalize to our of sample data.
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- num_threads: int, default=None
The number of threads the solver should use. If None, it will use all avaiable threads
- calc_metric(protect_feat: pandas.core.frame.DataFrame | numpy.ndarray, y: pandas.core.series.Series | numpy.ndarray, y_pred: pandas.core.series.Series | numpy.ndarray)[source]¶
Calculate the predictive equality metric for the given data.
- Parameters:
- protect_featarray-like of shape (n_samples, n_protected_features)
The protected feature columns (e.g., race, gender). Can have one or more columns.
- yarray-like of shape (n_samples,)
The true target values.
- y_predarray-like of shape (n_samples,)
The predicted values.
- Returns:
- eq_dictdict
A dictionary with key (p, t, t_pred) and value P(Y_pred=t_pred|P=p, Y=t), where p is a protected level, t is a true outcome value, and t_pred is a predicted outcome value.
Notes
This method calculates the predictive equality metric, which measures the difference in false positive rates across different protected groups.
- class odtlearn.fair_oct.FairEOppOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]¶
Bases:
FairConstrainedOCT
An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring equality of opportunity (EOpp) between protected groups.
- Parameters:
- solver: str
A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- depthint, default = 1
A parameter specifying the depth of the tree
- time_limitint, default= 60
The given time limit (in seconds) for solving the MIO problem
- _lambdafloat, default = 0
The regularization parameter in the objective. _lambda is in the interval [0,1)
- obj_mode: str, default=”acc”
The objective should be used to learn an optimal decision tree. The two options are “acc” and “balance”. The accuracy objective attempts to maximize prediction accuracy while the balance objective aims to learn a balanced optimal decision tree to better generalize to our of sample data.
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- num_threads: int, default=None
The number of threads the solver should use. If None, it will use all avaiable threads
- class odtlearn.fair_oct.FairEOddsOCT(solver: str, positive_class: int, depth: int = 1, time_limit: int = 60, _lambda: float = 0, obj_mode: str = 'acc', fairness_bound: float = 1, num_threads: None | int = None, verbose: bool = False)[source]¶
Bases:
FairConstrainedOCT
An optimal classification tree fit on a given binary-valued data set with a fairness side-constraint requiring equal oddts (EOdds) between protected groups.
- Parameters:
- solver: str
A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- depthint, default = 1
A parameter specifying the depth of the tree
- time_limitint, default= 60
The given time limit (in seconds) for solving the MIO problem
- _lambdafloat, default = 0
The regularization parameter in the objective. _lambda is in the interval [0,1)
- obj_mode: str, default=”acc”
The objective should be used to learn an optimal decision tree. The two options are “acc” and “balance”. The accuracy objective attempts to maximize prediction accuracy while the balance objective aims to learn a balanced optimal decision tree to better generalize to our of sample data.
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- num_threads: int, default=None
The number of threads the solver should use. If None, it will use all avaiable threads
- class odtlearn.fair_oct.FairOCT(solver, positive_class, _lambda=0, depth=1, obj_mode='acc', fairness_type=None, fairness_bound=1, time_limit=60, num_threads=None, verbose=False)[source]¶
Bases:
odtlearn.flow_oct_ms.FlowOCTMultipleSink
An optimal and fair classification tree fitted on a given binary-valued data set. The fairness criteria enforced in the training step is one of statistical parity (SP), conditional statistical parity (CSP), predictive equality (PE), equal opportunity (EOpp) or equalized odds (EOdds).
- Parameters:
- solver: str
A string specifying the name of the solver to use to solve the MIP. Options are “Gurobi” and “CBC”. If the CBC binaries are not found, Gurobi will be used by default.
- positive_classint
The value of the class label which is corresponding to the desired outcome
- depthint, default= 1
A parameter specifying the depth of the tree
- time_limitint, default= 60
The given time limit (in seconds) for solving the MIO problem
- _lambdafloat, default= 0
The regularization parameter in the objective. _lambda is in the interval [0,1)
- num_threads: int, default=None
The number of threads the solver should use. If None, it will use all avaiable threads
- fairness_type: [None, ‘SP’, ‘CSP’, ‘PE’, ‘EOpp’, ‘EOdds’], default=None
The type of fairness criteria that we want to enforce
- fairness_bound: float (0,1], default=1
The bound of the fairness constraint. The smaller the value the stricter the fairness constraint and 1 corresponds to no fairness constraint enforced
- fit(X, y, protect_feat, legit_factor)[source]¶
Fit the FairOCT model to the given training data.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
The training input samples. Each feature should be binary (0 or 1).
- yarray-like of shape (n_samples,)
The target values (class labels) for the training samples.
- protect_featarray-like of shape (n_samples, n_protected_features)
The protected feature columns (e.g., race, gender). Can have one or more columns.
- legit_factorarray-like of shape (n_samples,)
The legitimate factor column (e.g., prior number of criminal acts).
- Returns:
- selfobject
Returns self.
- Raises:
- ValueError
If X contains non-binary values or if inputs have inconsistent numbers of samples.
Notes
This method fits the FairOCT model using mixed-integer optimization while considering fairness constraints. It sets up the optimization problem, solves it, and stores the results.
- predict(X)[source]¶
Predict class labels for samples in X using the fitted FairOCT model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
The input samples for which to make predictions. Each feature should be binary (0 or 1).
- Returns:
- y_predndarray of shape (n_samples,)
The predicted class labels for each sample in X.
- Raises:
- NotFittedError
If the model has not been fitted yet.
- ValueError
If X contains non-binary values or has a different number of features than the training data.
Notes
This method uses the fair decision tree learned during the fit process to classify new samples. It traverses the tree for each sample in X, following the branching decisions until reaching a leaf node, and returns the corresponding class prediction.
- get_SP(protect_feat, y)[source]¶
This function returns the statistical parity value for any given protected level and outcome value
- Parameters:
protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns
y – array-like, shape (n_samples,) The target values (class labels in classification).
- Return sp_dict:
a dictionary with key =(p,t) and value = P(Y=t|P=p)
where p is a protected level and t is an outcome value
- get_CSP(protect_feat, legit_factor, y)[source]¶
This function returns the conditional statistical parity value for any given protected level, legitimate feature value and outcome value
- Parameters:
protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns
legit_fact – array-like, shape (n_samples,) The legitimate factor column(e.g., prior number of criminal acts)
y – array-like, shape (n_samples,) The target values (class labels in classification).
- Return csp_dict:
a dictionary with key =(p, f, t) and value = P(Y=t|P=p, L=f) where p is a protected level and t is an outcome value and l is the value of the legitimate feature
- get_EqOdds(protect_feat, y, y_pred)[source]¶
This function returns the false positive and true positive rate value for any given protected level, outcome value and prediction value
- Parameters:
protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns
y – array-like, shape (n_samples,) The true target values (class labels in classification).
y_pred – array-like, shape (n_samples,) The predicted values (class labels in classification).
- Return eq_dict:
a dictionary with key =(p, t, t_pred) and value = P(Y_pred=t_pred|P=p, Y=t)
- get_CondEqOdds(protect_feat, legit_factor, y, y_pred)[source]¶
This function returns the conditional false negative and true positive rate value for any given protected level, outcome value, prediction value and legitimate feature value
- Parameters:
protect_feat – array-like, shape (n_samples,1) or (n_samples, n_p) The protected feature columns (Race, gender, etc); We could have one or more columns
legit_factor – array-like, shape (n_samples,) The legitimate factor column(e.g., prior number of criminal acts)
y – array-like, shape (n_samples,) The true target values (class labels in classification).
y_pred – array-like, shape (n_samples,) The predicted values (class labels in classification).
- Return ceq_dict:
a dictionary with key =(p, f, t, t_pred) and value = P(Y_pred=t_pred|P=p, Y=t, L=f)
- fairness_metric_summary(metric, new_data=None)[source]¶
Summarize the specified fairness metric for the fitted model.
- Parameters:
- metricstr
The name of the fairness metric to summarize. Must be one of ‘SP’, ‘CSP’, ‘PE’, or ‘CPE’.
- new_dataarray-like of shape (n_samples,), optional
The new predicted data to use for calculating the fairness metric. If None, the predict method is called on the training data to obtain the predicted values. Default is None.
- Returns:
- None
The method prints the fairness metric summary as a pandas DataFrame.
- Raises:
- ValueError
If the specified metric is not one of the supported options.
Notes
This method summarizes the specified fairness metric for the fitted model. The supported fairness metrics are: - ‘SP’: Statistical Parity - ‘CSP’: Conditional Statistical Parity - ‘PE’: Predictive Equality - ‘CPE’: Conditional Predictive Equality
The method checks if the model has been fitted and raises an error if not. If new_data is not provided, the predict method is called on the training data to obtain the predicted values.
The fairness metric summary is printed as a pandas DataFrame, showing the metric values for each combination of protected attribute, legitimate factor (if applicable), true label, and predicted label (if applicable), depending on the selected metric.
Examples
>>> model.fit(X_train, y_train, protect_feat_train, legit_factor_train) >>> model.fairness_metric_summary('SP') (p,y) P(Y=y|P=p) 0 (Male, False) 0.752475 1 (Male, True) 0.247525 2 (Female, False) 0.742574 3 (Female, True) 0.257426