[1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from odtlearn.flow_oct import FlowOCT, BendersOCT
from odtlearn.utils.binarize import Binarizer

`FlowOCT` Examples¶

Example 0: Binarization¶

The following example shows how to binarize a dataset with categorical, integer, and continuous features using the built-in Binarizer class. This class follows the scikit-learn fit-transform paradigm, making it easy to integrate into your preprocessing pipeline.

[2]:

number_of_child_list = [1, 2, 4, 3, 1, 2, 4, 3, 2, 1]
age_list = [10, 20, 40, 30, 10, 20, 40, 30, 20, 10]
race_list = [
    "Black",
    "White",
    "Hispanic",
    "Black",
    "White",
    "Black",
    "White",
    "Hispanic",
    "Black",
    "White",
]
sex_list = ["M", "F", "M", "M", "F", "M", "F", "M", "M", "F"]
income_list = [50000, 75000, 100000, 60000, 80000, 55000, 90000, 70000, 85000, 65000]

df = pd.DataFrame(
    list(zip(sex_list, race_list, number_of_child_list, age_list, income_list)),
    columns=["sex", "race", "num_child", "age", "income"],
)

print(df)

  sex      race  num_child  age  income
0   M     Black          1   10   50000
1   F     White          2   20   75000
2   M  Hispanic          4   40  100000
3   M     Black          3   30   60000
4   F     White          1   10   80000
5   M     Black          2   20   55000
6   F     White          4   40   90000
7   M  Hispanic          3   30   70000
8   M     Black          2   20   85000
9   F     White          1   10   65000

[3]:

binarizer = Binarizer(
    categorical_cols=["sex", "race"],
    integer_cols=["num_child", "age"],
    real_cols=["income"],
    n_bins=5  # Number of bins for continuous features
)

# Fit and transform the data
df_enc = binarizer.fit_transform(df)

print(df_enc)

   sex_M  race_Black  race_Hispanic  race_White  num_child_1  num_child_2  \
0    1.0         1.0            0.0         0.0          1.0          1.0
1    0.0         0.0            0.0         1.0          0.0          1.0
2    1.0         0.0            1.0         0.0          0.0          0.0
3    1.0         1.0            0.0         0.0          0.0          0.0
4    0.0         0.0            0.0         1.0          1.0          1.0
5    1.0         1.0            0.0         0.0          0.0          1.0
6    0.0         0.0            0.0         1.0          0.0          0.0
7    1.0         0.0            1.0         0.0          0.0          0.0
8    1.0         1.0            0.0         0.0          0.0          1.0
9    0.0         0.0            0.0         1.0          1.0          1.0

   num_child_3  num_child_4  age_10  age_20  age_30  age_40  income_0  \
0          1.0          1.0     1.0     1.0     1.0     1.0       1.0
1          1.0          1.0     0.0     1.0     1.0     1.0       0.0
2          0.0          1.0     0.0     0.0     0.0     1.0       0.0
3          1.0          1.0     0.0     0.0     1.0     1.0       0.0
4          1.0          1.0     1.0     1.0     1.0     1.0       0.0
5          1.0          1.0     0.0     1.0     1.0     1.0       1.0
6          0.0          1.0     0.0     0.0     0.0     1.0       0.0
7          1.0          1.0     0.0     0.0     1.0     1.0       0.0
8          1.0          1.0     0.0     1.0     1.0     1.0       0.0
9          1.0          1.0     1.0     1.0     1.0     1.0       0.0

   income_1  income_2  income_3  income_4
0       1.0       1.0       1.0       1.0
1       0.0       1.0       1.0       1.0
2       0.0       0.0       0.0       1.0
3       1.0       1.0       1.0       1.0
4       0.0       0.0       1.0       1.0
5       1.0       1.0       1.0       1.0
6       0.0       0.0       0.0       1.0
7       0.0       1.0       1.0       1.0
8       0.0       0.0       1.0       1.0
9       1.0       1.0       1.0       1.0

The Binarizer class follows the scikit-learn fit-transform paradigm:

We initialize the Binarizer with our desired parameters, specifying which columns are categorical, integer, and real-valued. We call the fit_transform method, which first fits the binarizer to our data (learning the necessary encoding schemes) and then transforms the data using those learned encodings.

The resulting df_enc DataFrame contains the binarized version of our original data:

Categorical columns (sex, race) are one-hot encoded. Integer columns (num_child, age) are binary encoded, where each column represents “greater than or equal to” a certain value. The continuous column (income) is first discretized into 5 bins, then binary encoded similar to the integer columns.

[4]:

print(df_enc.columns)

Index(['sex_M', 'race_Black', 'race_Hispanic', 'race_White', 'num_child_1',
       'num_child_2', 'num_child_3', 'num_child_4', 'age_10', 'age_20',
       'age_30', 'age_40', 'income_0', 'income_1', 'income_2', 'income_3',
       'income_4'],
      dtype='object')

This binarized data is now ready to be used with any of the models in ODTlearn, which require binary input features. If you need to transform new data using the same encoding scheme, you can use the transform method of the fitted binarizer:

[5]:

new_data = pd.DataFrame({
    "sex": ["F", "M"],
    "race": ["Hispanic", "White"],
    "num_child": [2, 3],
    "age": [20, 30],
    "income": [70000, 80000]
})

new_data_enc = binarizer.transform(new_data)
print(new_data_enc)

   sex_M  race_Black  race_Hispanic  race_White  num_child_1  num_child_2  \
0    0.0         0.0            1.0         0.0          0.0          1.0
1    1.0         0.0            0.0         1.0          0.0          0.0

   num_child_3  num_child_4  age_10  age_20  age_30  age_40  income_0  \
0          1.0          1.0     0.0     1.0     1.0     1.0         0
1          1.0          1.0     0.0     0.0     1.0     1.0         0

   income_1  income_2  income_3  income_4
0         0       1.0       1.0         0
1         0       0.0       1.0         0

Note that for categorical features, if the new data contains categories not seen during fitting, the transform method will raise an error. In such cases, you might need to refit the binarizer on a dataset that includes all possible categories.

Example 1: Varying `depth` and `_lambda`¶

In this part, we study a simple example and investigate different parameter combinations to provide intuition on how they affect the structure of the tree.

First we generate the data for our example. The diagram within the code block shows the training dataset. Our dataset has two binary features (X1 and X2) and two class labels (+1 and -1).

[6]:

from odtlearn.datasets import flow_oct_example

"""
    X2
    |               |
    |               |
    1    + +        |    -
    |               |
    |---------------|-------------
    |               |
    0    - - - -    |    + + +
    |    - - -      |
    |______0________|_______1_______X1
"""


X, y = flow_oct_example()

Tree with `depth = 1`¶

In the following, we fit a classification tree of depth 1, i.e., a tree with a single branching node and two leaf nodes.

[7]:

stcl = FlowOCT(depth=1, solver="gurobi", time_limit=100)
stcl.store_search_progress_log = True
stcl.fit(X, y)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 100
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 110 rows, 89 columns and 250 nonzeros
Model fingerprint: 0x5e9209d1
Variable types: 84 continuous, 5 integer (5 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 102 rows and 82 columns
Presolve time: 0.00s
Presolved: 8 rows, 7 columns, 16 nonzeros
Variable types: 6 continuous, 1 integer (1 binary)
Found heuristic solution: objective 9.0000000

Root relaxation: objective 1.000000e+01, 2 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

*    0     0               0      10.0000000   10.00000  0.00%     -    0s

Explored 1 nodes (2 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 10 9

Optimal solution found (tolerance 1.00e-04)
Best objective 1.000000000000e+01, best bound 1.000000000000e+01, gap 0.0000%

User-callback calls 573, time in user-callback 0.00 sec

[7]:

FlowOCT(solver=gurobi,depth=1,time_limit=100,num_threads=None,verbose=False)

[8]:

predictions = stcl.predict(X)
print(f'Optimality gap is {stcl.optim_gap}')
print(f"In-sample accuracy is {np.sum(predictions==y)/y.shape[0]}")

Optimality gap is 0.0
In-sample accuracy is 0.7692307692307693

Users can access statistics from the optimization run such as optimality gap, number of nodes, number of constraints, etc. Directly as properties of the initialized class or through the _solver object. For example, one can access the optimality gap and the number of solutions after fitting the optimal decision tree through the optim_gap and num_solutions properties.

[9]:

print(f'Optimality gap is {stcl.optim_gap}')
print(f'Number of solutions {stcl.num_solutions}')

Optimality gap is 0.0
Number of solutions 2

As we can see above, we find the optimal tree and the in-sample accuracy is 76%.

ODTlearn provides two different ways of visualizing the structure of the tree. The first method prints the structure of the tree in the console:

[10]:

stcl.print_tree()

#########node  1
branch on X_0
#########node  2
leaf 0
#########node  3
leaf 1

The second method plots the structure of the tree using matplotlib:

[11]:

fig, ax = plt.subplots(figsize=(5, 5))
stcl.plot_tree(ax=ax)
plt.show()

We also provide a simple function allowing users to plot the search progress log over time. Note that you must set the attribute store_search_progress_log to True before calling the fit method to ensure that the bound information is stored.

[12]:

stcl_progress_log = FlowOCT(depth=1, solver="gurobi", time_limit=100)
stcl_progress_log.store_search_progress_log = True
stcl_progress_log.fit(X, y)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 100
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 110 rows, 89 columns and 250 nonzeros
Model fingerprint: 0x5e9209d1
Variable types: 84 continuous, 5 integer (5 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 102 rows and 82 columns
Presolve time: 0.00s
Presolved: 8 rows, 7 columns, 16 nonzeros
Variable types: 6 continuous, 1 integer (1 binary)
Found heuristic solution: objective 9.0000000

Root relaxation: objective 1.000000e+01, 2 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

*    0     0               0      10.0000000   10.00000  0.00%     -    0s

Explored 1 nodes (2 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 10 9

Optimal solution found (tolerance 1.00e-04)
Best objective 1.000000000000e+01, best bound 1.000000000000e+01, gap 0.0000%

User-callback calls 573, time in user-callback 0.00 sec

[12]:

FlowOCT(solver=gurobi,depth=1,time_limit=100,num_threads=None,verbose=False)

[13]:

stcl.plot_search_progress()

[13]:

<AxesSubplot:title={'center':'Search Progress'}, xlabel='Time (s)', ylabel='Objective Bound'>

Tree with `depth = 2`¶

Now we increase the depth of the tree to achieve higher accuracy.

[14]:

stcl = FlowOCT(depth=2, solver="gurobi")
stcl.fit(X, y)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 60
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 274 rows, 209 columns and 642 nonzeros
Model fingerprint: 0x51470803
Variable types: 196 continuous, 13 integer (13 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 250 rows and 187 columns
Presolve time: 0.00s
Presolved: 24 rows, 22 columns, 76 nonzeros
Found heuristic solution: objective 8.0000000
Variable types: 16 continuous, 6 integer (6 binary)

Root relaxation: objective 1.300000e+01, 14 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

*    0     0               0      13.0000000   13.00000  0.00%     -    0s

Explored 1 nodes (14 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 8 (of 8 available processors)

Solution count 3: 13 8 -0

Optimal solution found (tolerance 1.00e-04)
Best objective 1.300000000000e+01, best bound 1.300000000000e+01, gap 0.0000%

[14]:

FlowOCT(solver=gurobi,depth=2,time_limit=60,num_threads=None,verbose=False)

[15]:

predictions = stcl.predict(X)
print(f"In-sample accuracy is {np.sum(predictions==y)/y.shape[0]}")

In-sample accuracy is 1.0

As we can see, with depth 2, we can achieve 100% in-sample accuracy.

[16]:

fig, ax = plt.subplots(figsize=(10, 5))
stcl.plot_tree(ax=ax, fontsize=20)
plt.show()

Tree with `depth=2` and Positive `_lambda`¶

As we saw in the above example, with depth 2, we can fully classify the training data. However if we add a regularization term with a high enough value of _lambda, we can justify pruning one of the branching nodes to get a sparser tree. In the following, we observe that as we increase _lambda from 0 to 0.51, one of the branching nodes gets pruned and as a result, the in-sample accuracy drops to 92%.

[17]:

stcl = FlowOCT(solver="gurobi", depth=2, _lambda=0.51)
stcl.fit(X, y)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 60
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 274 rows, 209 columns and 642 nonzeros
Model fingerprint: 0x3e3f455b
Variable types: 196 continuous, 13 integer (13 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [5e-01, 5e-01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 250 rows and 187 columns
Presolve time: 0.00s
Presolved: 24 rows, 22 columns, 76 nonzeros
Found heuristic solution: objective 3.9200000
Variable types: 16 continuous, 6 integer (6 binary)

Root relaxation: objective 5.105000e+00, 14 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0    5.10500    0    2    3.92000    5.10500  30.2%     -    0s
H    0     0                       4.8600000    5.10500  5.04%     -    0s

Explored 1 nodes (14 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 8 (of 8 available processors)

Solution count 3: 4.86 3.92 -0

Optimal solution found (tolerance 1.00e-04)
Best objective 4.860000000000e+00, best bound 4.860000000000e+00, gap 0.0000%

[17]:

FlowOCT(solver=gurobi,depth=2,time_limit=60,num_threads=None,verbose=False)

[18]:

predictions = stcl.predict(X)
print(f"In-sample accuracy is {np.sum(predictions==y)/y.shape[0]}")

In-sample accuracy is 0.9230769230769231

[19]:

fig, ax = plt.subplots(figsize=(10, 5))
stcl.plot_tree(ax=ax, fontsize=20)
plt.show()

Example 2: Different Objective Functions¶

In the following, we have a toy example with an imbalanced data, with the positive class being the minority class.

[20]:

'''
    X2
    |               |
    |               |
    1    + - -      |    -
    |               |
    |---------------|--------------
    |               |
    0    - - - +    |    - - -
    |    - - - -    |
    |______0________|_______1_______X1
'''
X = np.array([[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],
              [1,0],[1,0],[1,0],
              [1,1],
              [0,1],[0,1],[0,1]])
y = np.array([0,0,0,0,0,0,0,1,
              0,0,0,
              0,
              1,0,0])

Tree with classification accuracy objective¶

[21]:

stcl_acc = FlowOCT(solver="gurobi", depth=2, obj_mode="acc")
stcl_acc.fit(X, y)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 60
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 314 rows, 237 columns and 734 nonzeros
Model fingerprint: 0xa5b9a46b
Variable types: 224 continuous, 13 integer (13 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 269 rows and 203 columns
Presolve time: 0.00s
Presolved: 45 rows, 34 columns, 134 nonzeros
Found heuristic solution: objective 13.0000000
Variable types: 27 continuous, 7 integer (7 binary)

Root relaxation: objective 1.400000e+01, 32 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   14.00000    0    1   13.00000   14.00000  7.69%     -    0s

Cutting planes:
  MIR: 1
  Flow cover: 1

Explored 1 nodes (32 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 13 -0

Optimal solution found (tolerance 1.00e-04)
Best objective 1.300000000000e+01, best bound 1.300000000000e+01, gap 0.0000%

[21]:

FlowOCT(solver=gurobi,depth=2,time_limit=60,num_threads=None,verbose=False)

[22]:

predictions = stcl_acc.predict(X)
print(f"In-sample accuracy is {np.sum(predictions==y)/y.shape[0]}")

In-sample accuracy is 0.8666666666666667

[23]:

stcl_acc.print_tree()

#########node  1
leaf 0
#########node  2
pruned
#########node  3
pruned
#########node  4
pruned
#########node  5
pruned
#########node  6
pruned
#########node  7
pruned

[24]:

fig, ax = plt.subplots(figsize=(10, 5))
stcl_acc.plot_tree(ax=ax, fontsize=20)
plt.show()

Tree with Balanced Classification Accuracy Objective¶

[25]:

stcl_balance = FlowOCT(
    solver="gurobi",
    depth=2,
    obj_mode="balance",
    _lambda=0,
    verbose=False,
)
stcl_balance.fit(X, y)
predictions = stcl_balance.predict(X)
print(f"In-sample accuracy is {np.sum(predictions==y)/y.shape[0]}")

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 60
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 314 rows, 237 columns and 734 nonzeros
Model fingerprint: 0x3bc6df12
Variable types: 224 continuous, 13 integer (13 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [4e-02, 2e-01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 269 rows and 203 columns
Presolve time: 0.01s
Presolved: 45 rows, 34 columns, 134 nonzeros
Found heuristic solution: objective 0.5000000
Variable types: 27 continuous, 7 integer (7 binary)

Root relaxation: objective 7.500000e-01, 31 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0    0.75000    0    1    0.50000    0.75000  50.0%     -    0s
H    0     0                       0.5288462    0.75000  41.8%     -    0s
H    0     0                       0.6730769    0.75000  11.4%     -    0s
     0     0    0.67308    0    4    0.67308    0.67308  0.00%     -    0s

Cutting planes:
  MIR: 1
  Flow cover: 1

Explored 1 nodes (36 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 8 (of 8 available processors)

Solution count 3: 0.673077 0.5 -0
No other solutions better than 0.673077

Optimal solution found (tolerance 1.00e-04)
Best objective 6.730769230769e-01, best bound 6.730769230769e-01, gap 0.0000%
In-sample accuracy is 0.8

[26]:

fig, ax = plt.subplots(figsize=(10, 5))
stcl_balance.plot_tree(ax=ax, fontsize=20)
plt.show()

As we can see, when we maximize accuracy, i.e., when obj_mode = 'acc', the optimal tree is just a single node without branching, predicting the majority class for the whole dataset. But when we change the objective mode to balanced accuracy, we account for the minority class by sacrificing the overal accuracy.

Example 3: UCI Data Example¶

In this section, we fit a tree of depth 3 on a real world dataset called the `balance dataset <https://archive.ics.uci.edu/ml/datasets/Balance+Scale>`__ from the UCI Machine Learning repository.

[27]:

import pandas as pd
from sklearn.model_selection import train_test_split
from odtlearn.datasets import balance_scale_data

[28]:

# read data
data = balance_scale_data()
print(f"shape{data.shape}")
data.columns

shape(625, 21)

[28]:

Index(['V2.1', 'V2.2', 'V2.3', 'V2.4', 'V2.5', 'V3.1', 'V3.2', 'V3.3', 'V3.4',
       'V3.5', 'V4.1', 'V4.2', 'V4.3', 'V4.4', 'V4.5', 'V5.1', 'V5.2', 'V5.3',
       'V5.4', 'V5.5', 'target'],
      dtype='object')

[29]:

y = data.pop("target")

X_train, X_test, y_train, y_test = train_test_split(
    data, y, test_size=0.33, random_state=42
)

[30]:

stcl = BendersOCT(solver="gurobi", depth=3, time_limit=200, obj_mode="acc", verbose=True)

stcl.fit(X_train, y_train)

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 200
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter LazyConstraints to value 1
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 30 rows, 618 columns and 249 nonzeros
Model fingerprint: 0xaeff5467
Variable types: 463 continuous, 155 integer (155 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 8 rows and 8 columns
Presolve time: 0.00s
Presolved: 22 rows, 610 columns, 233 nonzeros
Variable types: 463 continuous, 147 integer (147 binary)
Root relaxation presolved: 412 rows, 610 columns, 7100 nonzeros

Concurrent LP optimizer: primal simplex, dual simplex, and barrier
Showing barrier log only...

Root barrier log...

Ordering time: 0.00s

Barrier performed 0 iterations in 0.18 seconds (0.00 work units)
Barrier solve interrupted - model solved by another algorithm


Solved with dual simplex

Root relaxation: objective 4.180000e+02, 77 iterations, 0.01 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0  418.00000    0    4          -  418.00000      -     -    0s
H    0     0                     269.0000000  418.00000  55.4%     -    0s
     0     0  418.00000    0    4  269.00000  418.00000  55.4%     -    0s
     0     2  413.60000    0    9  269.00000  413.60000  53.8%     -    0s
H  308   294                     272.0000000  413.60000  52.1%  75.5    1s
H  483   460                     274.0000000  413.60000  50.9%  75.6    1s
*  533   475              54     290.0000000  413.60000  42.6%  73.6    1s
  1143   887     cutoff   64       290.00000  412.40000  42.2%  78.6    5s
H 1310   970                     291.0000000  412.40000  41.7%  75.5    5s
  1556  1132  316.00000   35   26  291.00000  409.65561  40.8%  70.5   10s
  1629  1180  371.73684   22   25  291.00000  403.03327  38.5%  67.3   15s
  1797  1290  367.57918   36   10  291.00000  402.21668  38.2%  92.8   20s
H 2163  1398                     293.0000000  400.54472  36.7%  92.9   21s
* 2375  1361             111     296.0000000  399.64465  35.0%  91.6   22s
H 3419  1561                     299.0000000  398.14472  33.2%  86.5   24s
  3861  1733  358.57143   67   20  299.00000  398.14472  33.2%  87.2   25s
H 4454  1024                     312.0000000  398.14472  27.6%  85.7   25s
  8785  2721     cutoff   62       312.00000  358.67679  15.0%  82.5   30s
 16715  6109  313.50000   67   16  312.00000  338.60887  8.53%  69.7   35s
H17827  5595                     314.0000000  338.16497  7.70%  67.8   35s
 26793  9027  315.00000   77    8  314.00000  330.38947  5.22%  59.2   40s
H31789  9084                     315.0000000  329.54167  4.62%  57.5   43s
 33926  9710  318.00000   71    8  315.00000  328.75000  4.37%  56.7   45s
H35037  8771                     316.0000000  328.02174  3.80%  56.5   45s
H36500  6760                     318.0000000  327.50000  2.99%  56.0   46s
 42743  6716  322.50000   59   12  318.00000  325.00000  2.20%  54.1   50s
 54532  4635     cutoff   48       318.00000  321.00000  0.94%  50.5   55s
 65931  1611     cutoff   74       318.00000  319.48148  0.47%  47.7   60s

Cutting planes:
  MIR: 120
  Flow cover: 193
  Lazy constraints: 236

Explored 70775 nodes (3292092 simplex iterations) in 61.92 seconds (141.79 work units)
Thread count was 8 (of 8 available processors)

Solution count 10: 318 316 315 ... 290

Optimal solution found (tolerance 1.00e-04)
Best objective 3.180000000000e+02, best bound 3.180000000000e+02, gap 0.0000%

User-callback calls 150674, time in user-callback 1.85 sec

[30]:

BendersOCT(solver=gurobi,depth=3,time_limit=200,num_threads=None,verbose=True)

[31]:

stcl.print_tree()

#########node  1
branch on V3.2
#########node  2
branch on V3.1
#########node  3
branch on V5.1
#########node  4
branch on V2.1
#########node  5
branch on V5.1
#########node  6
branch on V4.1
#########node  7
branch on V2.1
#########node  8
leaf 2
#########node  9
leaf 3
#########node  10
leaf 3
#########node  11
leaf 2
#########node  12
leaf 3
#########node  13
leaf 2
#########node  14
leaf 2
#########node  15
leaf 3

[32]:

fig, ax = plt.subplots(figsize=(20, 10))
stcl.plot_tree(ax=ax, fontsize=20, color_dict={"node": None, "leaves": []})
plt.show()

[33]:

test_pred = stcl.predict(X_test)
print('The out-of-sample accuracy is {}'.format(np.sum(test_pred==y_test)/y_test.shape[0]))

The out-of-sample accuracy is 0.6859903381642513

Example 4: User-defined Weights¶

In this example, we’ll demonstrate how to use user-defined weights with FlowOCT and BendersOCT. We’ll use a small binary classification dataset and show how user-defined weights can affect the learned tree.

First, let’s create our small binary classification dataset:

[34]:

np.random.seed(42)
X = np.random.randint(0, 2, size=(20, 5))
y = np.random.randint(0, 2, size=20)

print("Dataset shape:", X.shape)
print("Class distribution:", np.bincount(y))

Dataset shape: (20, 5)
Class distribution: [ 6 14]

Now, let’s create weights that heavily favor class 1:

[35]:

weights = np.ones_like(y)
weights[y == 1] = 10

print("Weight distribution:")
print("Class 0:", weights[y == 0].mean())
print("Class 1:", weights[y == 1].mean())

Weight distribution:
Class 0: 1.0
Class 1: 10.0

FlowOCT with User-defined Weights¶

Let’s fit a FlowOCT model with user-defined weights and compare it to a model without the accuracy objective:

[36]:

# FlowOCT without custom weights
flow_oct_default = FlowOCT(solver="gurobi", obj_mode="acc", depth=2, time_limit=10)
flow_oct_default.fit(X, y)

# FlowOCT with custom weights
flow_oct_custom = FlowOCT(solver="gurobi", obj_mode="weighted", depth=2, time_limit=10)
flow_oct_custom.fit(X, y, weights=weights)

print("Default FlowOCT predictions:", flow_oct_default.predict(X))
print("User-defined weights FlowOCT predictions:", flow_oct_custom.predict(X))

print("Default FlowOCT accuracy:", (flow_oct_default.predict(X) == y).mean())
print("User-defined weights FlowOCT accuracy:", (flow_oct_custom.predict(X) == y).mean())
print("User-defined weights FlowOCT weighted accuracy:", np.average(flow_oct_custom.predict(X) == y, weights=weights))

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 10
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 414 rows, 316 columns and 1153 nonzeros
Model fingerprint: 0x1ad65f5a
Variable types: 294 continuous, 22 integer (22 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 205 rows and 186 columns
Presolve time: 0.00s
Presolved: 209 rows, 130 columns, 660 nonzeros
Found heuristic solution: objective 14.0000000
Variable types: 112 continuous, 18 integer (18 binary)

Root relaxation: objective 1.966667e+01, 166 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   19.66667    0    7   14.00000   19.66667  40.5%     -    0s
H    0     0                      16.0000000   19.66667  22.9%     -    0s
H    0     0                      17.0000000   19.00000  11.8%     -    0s
     0     0   18.50000    0    9   17.00000   18.50000  8.82%     -    0s

Cutting planes:
  Gomory: 13
  MIR: 6
  RLT: 20

Explored 1 nodes (198 simplex iterations) in 0.01 seconds (0.01 work units)
Thread count was 8 (of 8 available processors)

Solution count 4: 17 16 14 -0

Optimal solution found (tolerance 1.00e-04)
Best objective 1.700000000000e+01, best bound 1.700000000000e+01, gap 0.0000%
Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 10
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 414 rows, 316 columns and 1153 nonzeros
Model fingerprint: 0xf6c20781
Variable types: 294 continuous, 22 integer (22 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 205 rows and 186 columns
Presolve time: 0.00s
Presolved: 209 rows, 130 columns, 660 nonzeros
Found heuristic solution: objective 140.0000000
Variable types: 112 continuous, 18 integer (18 binary)

Root relaxation: objective 1.456667e+02, 173 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0  145.66667    0    7  140.00000  145.66667  4.05%     -    0s
H    0     0                     143.0000000  144.66667  1.17%     -    0s
     0     0  144.00000    0   11  143.00000  144.00000  0.70%     -    0s
     0     0  143.00000    0    6  143.00000  143.00000  0.00%     -    0s

Cutting planes:
  Gomory: 1
  MIR: 1
  RLT: 7

Explored 1 nodes (233 simplex iterations) in 0.01 seconds (0.01 work units)
Thread count was 8 (of 8 available processors)

Solution count 3: 143 140 -0

Optimal solution found (tolerance 1.00e-04)
Best objective 1.430000000000e+02, best bound 1.430000000000e+02, gap 0.0000%
Default FlowOCT predictions: [1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0]
Custom weights FlowOCT predictions: [1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0]

Default FlowOCT accuracy: 0.85
Custom weights FlowOCT accuracy: 0.85
Custom weights FlowOCT weighted accuracy: 0.9794520547945206

Let’s visualize both trees:

[37]:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

flow_oct_default.plot_tree(ax=ax1, fontsize=10)
ax1.set_title("Default FlowOCT")

flow_oct_custom.plot_tree(ax=ax2, fontsize=10)
ax2.set_title("User-defined Weights FlowOCT")

plt.tight_layout()
plt.show()

BendersOCT with User-defined Weights¶

Now let’s do the same with BendersOCT:

[38]:

# BendersOCT without custom weights
benders_oct_default = BendersOCT(solver="gurobi", obj_mode="acc", depth=2, time_limit=10)
benders_oct_default.fit(X, y)

# BendersOCT with custom weights
benders_oct_custom = BendersOCT(solver="gurobi", obj_mode="weighted", depth=2, time_limit=10, verbose=False)
benders_oct_custom.fit(X, y, weights=weights)

print("Default BendersOCT predictions:", benders_oct_default.predict(X))
print("User-defined weights BendersOCT predictions:", benders_oct_custom.predict(X))

print("Default BendersOCT accuracy:", (benders_oct_default.predict(X) == y).mean())
print("User-defined weights BendersOCT accuracy:", (benders_oct_custom.predict(X) == y).mean())
print("User-defined weights BendersOCT weighted accuracy:", np.average(benders_oct_custom.predict(X) == y, weights=weights))

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 10
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter LazyConstraints to value 1
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 14 rows, 56 columns and 53 nonzeros
Model fingerprint: 0x28902962
Variable types: 34 continuous, 22 integer (22 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 4 rows and 4 columns
Presolve time: 0.00s
Presolved: 10 rows, 52 columns, 45 nonzeros
Variable types: 34 continuous, 18 integer (18 binary)
Root relaxation presolved: 22 rows, 50 columns, 140 nonzeros


Root relaxation: objective 2.000000e+01, 2 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   20.00000    0    -          -   20.00000      -     -    0s
     0     0   20.00000    0    -          -   20.00000      -     -    0s
     0     0   20.00000    0    -          -   20.00000      -     -    0s
     0     0   20.00000    0    7          -   20.00000      -     -    0s
H    0     0                      14.0000000   20.00000  42.9%     -    0s
H    0     0                      16.0000000   20.00000  25.0%     -    0s
     0     0   19.50000    0    2   16.00000   19.50000  21.9%     -    0s
     0     0   19.26563    0    9   16.00000   19.26563  20.4%     -    0s
H    0     0                      17.0000000   19.26563  13.3%     -    0s
     0     0   19.00000    0    8   17.00000   19.00000  11.8%     -    0s
     0     0   18.96429    0    9   17.00000   18.96429  11.6%     -    0s
     0     0   18.95833    0    9   17.00000   18.95833  11.5%     -    0s
     0     0   18.85000    0    9   17.00000   18.85000  10.9%     -    0s
     0     0   18.85000    0    9   17.00000   18.85000  10.9%     -    0s
     0     0   18.76563    0   11   17.00000   18.76563  10.4%     -    0s
     0     0   18.74194    0   10   17.00000   18.74194  10.2%     -    0s
     0     0   18.73913    0    9   17.00000   18.73913  10.2%     -    0s
     0     0   18.59091    0    9   17.00000   18.59091  9.36%     -    0s
     0     0   18.50000    0    9   17.00000   18.50000  8.82%     -    0s
     0     0   18.50000    0    9   17.00000   18.50000  8.82%     -    0s
     0     0   18.37500    0    8   17.00000   18.37500  8.09%     -    0s
     0     0   18.33333    0    8   17.00000   18.33333  7.84%     -    0s
     0     0   18.33333    0    9   17.00000   18.33333  7.84%     -    0s
     0     0   18.31250    0    7   17.00000   18.31250  7.72%     -    0s
     0     0   18.30000    0    7   17.00000   18.30000  7.65%     -    0s
     0     0   18.16667    0    7   17.00000   18.16667  6.86%     -    0s
     0     0   18.14706    0    8   17.00000   18.14706  6.75%     -    0s
     0     0   18.09524    0   10   17.00000   18.09524  6.44%     -    0s
     0     0   18.07059    0   10   17.00000   18.07059  6.30%     -    0s
     0     0   18.03478    0   10   17.00000   18.03478  6.09%     -    0s
     0     0   18.01765    0   11   17.00000   18.01765  5.99%     -    0s
     0     0   18.01333    0   11   17.00000   18.01333  5.96%     -    0s
     0     0   18.00532    0   12   17.00000   18.00532  5.91%     -    0s
     0     0   18.00463    0   12   17.00000   18.00463  5.91%     -    0s
     0     0   17.95833    0    9   17.00000   17.95833  5.64%     -    0s
     0     0   17.91304    0    8   17.00000   17.91304  5.37%     -    0s
     0     0   17.91304    0    8   17.00000   17.91304  5.37%     -    0s
     0     0   17.90909    0    8   17.00000   17.90909  5.35%     -    0s
     0     0   17.90909    0   11   17.00000   17.90909  5.35%     -    0s
     0     0   17.89873    0   11   17.00000   17.89873  5.29%     -    0s
     0     0   17.89873    0   10   17.00000   17.89873  5.29%     -    0s
     0     2   17.89873    0   10   17.00000   17.89873  5.29%     -    0s

Cutting planes:
  Gomory: 1
  MIR: 11
  Flow cover: 5
  Lazy constraints: 46

Explored 16 nodes (312 simplex iterations) in 0.05 seconds (0.02 work units)
Thread count was 8 (of 8 available processors)

Solution count 3: 17 16 14

Optimal solution found (tolerance 1.00e-04)
Best objective 1.700000000000e+01, best bound 1.700000000000e+01, gap 0.0000%

User-callback calls 366, time in user-callback 0.02 sec
Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-28
Set parameter TimeLimit to value 10
Set parameter NodeLimit to value 1073741824
Set parameter SolutionLimit to value 1073741824
Set parameter LazyConstraints to value 1
Set parameter IntFeasTol to value 1e-06
Set parameter Method to value 3
Gurobi Optimizer version 10.0.2 build v10.0.2rc0 (mac64[arm])

CPU model: Apple M1 Pro
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 14 rows, 56 columns and 53 nonzeros
Model fingerprint: 0x75b5cba0
Variable types: 34 continuous, 22 integer (22 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 1e+01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 4 rows and 4 columns
Presolve time: 0.00s
Presolved: 10 rows, 52 columns, 45 nonzeros
Variable types: 34 continuous, 18 integer (18 binary)
Root relaxation presolved: 22 rows, 50 columns, 140 nonzeros


Root relaxation: objective 1.460000e+02, 2 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0  146.00000    0    -          -  146.00000      -     -    0s
     0     0  146.00000    0    -          -  146.00000      -     -    0s
     0     0  146.00000    0    -          -  146.00000      -     -    0s
     0     0  146.00000    0    7          -  146.00000      -     -    0s
H    0     0                     140.0000000  146.00000  4.29%     -    0s
     0     0  145.50000    0    4  140.00000  145.50000  3.93%     -    0s
     0     0  145.37500    0    7  140.00000  145.37500  3.84%     -    0s
H    0     0                     143.0000000  145.37500  1.66%     -    0s
     0     0  144.88542    0   10  143.00000  144.88542  1.32%     -    0s
     0     0  144.88194    0   10  143.00000  144.88194  1.32%     -    0s
     0     0  144.33333    0    7  143.00000  144.33333  0.93%     -    0s
     0     0  144.30000    0    7  143.00000  144.30000  0.91%     -    0s
     0     0  144.16667    0    7  143.00000  144.16667  0.82%     -    0s
     0     0  144.12500    0    9  143.00000  144.12500  0.79%     -    0s
     0     0  144.12500    0    9  143.00000  144.12500  0.79%     -    0s
     0     0  144.08333    0   11  143.00000  144.08333  0.76%     -    0s
     0     0  144.08036    0   11  143.00000  144.08036  0.76%     -    0s
     0     0  144.08036    0   11  143.00000  144.08036  0.76%     -    0s
     0     0  144.02737    0   10  143.00000  144.02737  0.72%     -    0s
     0     0  144.02737    0   11  143.00000  144.02737  0.72%     -    0s
     0     0  144.02632    0   11  143.00000  144.02632  0.72%     -    0s
     0     0  144.02632    0   10  143.00000  144.02632  0.72%     -    0s
     0     2  144.02632    0    9  143.00000  144.02632  0.72%     -    0s

Cutting planes:
  Gomory: 1
  MIR: 3
  Flow cover: 9
  Lazy constraints: 47

Explored 14 nodes (157 simplex iterations) in 0.03 seconds (0.01 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 143 140

Optimal solution found (tolerance 1.00e-04)
Best objective 1.430000000000e+02, best bound 1.430000000000e+02, gap 0.0000%
Default BendersOCT predictions: [1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0]
Custom weights BendersOCT predictions: [1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0]

Default BendersOCT accuracy: 0.85
Custom weights BendersOCT accuracy: 0.85

User-callback calls 283, time in user-callback 0.01 sec
Custom weights BendersOCT weighted accuracy: 0.9794520547945206

[39]:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

benders_oct_default.plot_tree(ax=ax1, fontsize=10)
ax1.set_title("Default BendersOCT")

benders_oct_custom.plot_tree(ax=ax2, fontsize=10)
ax2.set_title("User-defined Weights BendersOCT")

plt.tight_layout()
plt.show()

In this example, we’ve demonstrated how to use user-defined weights with both FlowOCT and BendersOCT. By setting obj_mode="weighted" and providing weights during the fit method call, we can influence the importance of different samples in the training process. The user-defined weights in this example heavily favor class 1, which may result in trees that are more likely to predict class 1, potentially at the cost of overall accuracy. However, this can be useful in scenarios where misclassifying one class is more costly than misclassifying the other, or when dealing with imbalanced datasets. Note that the actual results may vary due to the random nature of the dataset and the optimization process. You may want to run the code multiple times or with different random seeds to get a better understanding of the effects of user-defined weights.

References¶

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
Aghaei, S., Gómez, A., & Vayanos, P. (2021). Strong optimal classification trees. arXiv preprint arXiv:2103.15965. https://arxiv.org/abs/2103.15965.

FlowOCT Examples¶