odtlearn.datasets
#
Module Contents#
Functions#
Return tuple of the train and test dataframes from the prescriptive tree example notebook |
|
Return a dataframe containing the balance-scale data set from the UCI ML repository. |
|
Returns tuple with two numpy arrays containing the data used in the first example in the Flow OCTexample notebook in the ODTlearn documentation. |
|
Returns tuple with three numpy arrays containing the data used in example 1 |
|
An example data set used to demonstrate usage of Flow OCT. |
|
A simulated data set used in the FairOCT example notebook. |
|
Return a dataframe containing the data set for MONK's second problem from the UCI ML repository. |
- odtlearn.datasets.prescriptive_ex_data()[source]#
Return tuple of the train and test dataframes from the prescriptive tree example notebook
- odtlearn.datasets.balance_scale_data()[source]#
Return a dataframe containing the balance-scale data set from the UCI ML repository. See the following URL for attribute information https://archive.ics.uci.edu/ml/datasets/Balance+Scale
- odtlearn.datasets.flow_oct_example()[source]#
Returns tuple with two numpy arrays containing the data used in the first example in the Flow OCTexample notebook in the ODTlearn documentation. The diagram within the code block shows the training dataset. Our dataset has two binary features (X1 and X2) and two class labels (+1 and -1).
X2 | | | | 1 + + | - | | |---------------|------------- | | 0 - - - - | + + + | - - - | |______0________|_______1_______X1
- Returns:
- X: numpy array of covariates from training set
- y: numpy array of responses from training set
- odtlearn.datasets.robustness_example()[source]#
Returns tuple with three numpy arrays containing the data used in example 1 of the RobustTree example notebook in the ODTlearn documentation. The diagram within the code block shows the training dataset. Our dataset has two binary features (X1 and X2) and two class labels (+1 and -1).
X2 | | | | 1 + + | - | | |---------------|------------- | | 0 - - - - | + + + | - - - | |______0________|_______1_______X1
The third array returned contains a cost vector with the following form: - Uncertainty in 5 points at [0,0] on X1 can cause it to flip to [1,0] if needed to misclassify - Uncertainty in 1 point at [1,1] on X2 can cause it to flip to [1,0] if needed to misclassify - All other points certain
- Returns:
- X: numpy array of covariates from training set
- y: numpy array of responses from training set
- costs: numpy array of costs for each observation in the training set
- odtlearn.datasets.example_2_data()[source]#
An example data set used to demonstrate usage of Flow OCT. The diagram within the code block shows the training dataset. Our dataset has two binary features (X1 and X2) and two class labels (+1 and -1). Here the data is imbalanced with the positive class being the minority class.
X2 | | | | 1 + - - | - | | |---------------|-------------- | | 0 - - - + | - - - | - - - - | |______0________|_______1_______X1
- Returns:
- X: numpy array of covariates from training set
- y: numpy array of responses from training set
- odtlearn.datasets.fairness_example()[source]#
A simulated data set used in the FairOCT example notebook. The diagram within the code block visualizes the training data. We have two binary features (X1, X2) and two class labels (+1 and -1). The protected feature is race and it has two levels (B and W). In the visualization of the training data, we see that, for example, there are 7 instances with (X1,X2) = (0,1) and among these 7 instances, 5 of them are from race W and 2 of them from race B. We also show the breakdown of the instances based on their class label.
X2 | | | 1 5W: 4(-) 1(+) | 2W: 1(-) 1(+) | 2B: 2(-) | 5B: 3(-) 2(+) | | | | |---------------------|------------------------ | | 0 4W: 3(-) 1(+) | 3W: 1(-) 2(+) | 1B: 1(+) | 6B: 1(-) 5(+) | | |___________0_________|__________1_____________X1
- Returns:
- X: numpy array of covariates from training set
- y: numpy array of responses from training set
- protect_feat: numpy array of the protected feature
- legit_factor: numpy array of the legitimate factor feature
- odtlearn.datasets.robust_example()[source]#
Return a dataframe containing the data set for MONK’s second problem from the UCI ML repository. See the following URL for attribute information https://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems