# Robust Optimal Classification Trees In many applications, full control of data collection in both training and deployment is rare. For example, the data collection mechanism may change between training and deployment, or the environment may change the distribution of data over time. This corresponds to a *distribution shift*, where the distribution of the training data does not match the distribution of the deployment data. As a result, any trained model can perform poorly in the testing/deployment phase in the presence of distribution shifts `RobustTreeClassifier` is an MIO-based method for building optimal classification trees robust to these distribution shifts for data with integer-valued features. Details on the method can be found in the paper (Justin et al. 2021).

robust_shift

## Specifying the Distribution Shift To fit a `RobustTreeClassifier`, the expected distribution shift must be specified. This is through the `costs` and `budget` parameters of the `RobustTreeClassifier.fit()` function. `RobustTreeClassifier` contains a function `probabilities_to_costs` to help generate the values for `costs` and `budget` based on knowledge from the application. The `prob` parameter of `probabilities_to_costs` is a matrix (of the same shape as the training covariates) where each entry contains the estimated probability that feature $f$ of sample $i$ will not be shifted, which can be set based on domain knowledge. The `threshold` parameter of `probabilities_to_costs` tunes how much robustness to distribution shifts is needed (in exchange for solving time), and is some value from 0 (exclusive) to 1 (inclusive), where 1 represents no robustness to uncertainty and values near 0 represent complete robustness to uncertainty (i.e. a tree that does not branch). In most settings, a reasonable range for this parameter is between 0.7 and 1. It is advised to tune this parameter and to try different values. For details on how the costs and budgets are mathematically derived from specified `prob` and `threshold` values, see the paper (Justin et al. 2021). ## References * Justin, N., Aghaei, S., Gómez, A., & Vayanos, P. (2021). Optimal robust classification trees. *The AAAI-2022 Workshop on Adversarial Machine Learning and Beyond*.