odtlearn.utils.validation

Functions

check_ipw(→ numpy.ndarray)

Check and validate inverse probability weights (IPW).

check_y_hat(→ numpy.ndarray)

Check and validate counterfactual predictions (y_hat).

check_y(→ numpy.ndarray)

Check and validate target values (y).

check_columns_match(→ None)

Check if the columns in new_data match the original_columns.

check_binary(→ None)

Check if all values in the DataFrame are binary (0 or 1).

check_integer(df)

Check if all values in the DataFrame are integers.

check_same_as_X(X, X_col_labels, G, G_label)

Check if a DataFrame G has the same structure as X.

Module Contents

odtlearn.utils.validation.check_ipw(X: numpy.ndarray, ipw: numpy.ndarray | pandas.core.series.Series) numpy.ndarray[source]

Check and validate inverse probability weights (IPW).

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples.

ipwarray-like of shape (n_samples,)

The inverse probability weights to be checked.

Returns:
ipwndarray of shape (n_samples,)

The validated and potentially converted inverse probability weights.

Raises:
ValueError

If ipw has inconsistent number of samples with X.

AssertionError

If any value in ipw is not in the range (0, 1].

Examples

>>> import numpy as np
>>> from odtlearn.utils.validation import check_ipw
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> ipw = np.array([0.5, 0.7, 0.3])
>>> validated_ipw = check_ipw(X, ipw)
>>> print(validated_ipw)
[0.5 0.7 0.3]
odtlearn.utils.validation.check_y_hat(X: numpy.ndarray, treatments: numpy.ndarray, y_hat: pandas.core.frame.DataFrame | numpy.ndarray) numpy.ndarray[source]

Check and validate counterfactual predictions (y_hat).

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples.

treatmentsarray-like

The unique treatment values.

y_hatarray-like of shape (n_samples, n_treatments)

The counterfactual predictions to be checked.

Returns:
y_hatndarray of shape (n_samples, n_treatments)

The validated and potentially converted counterfactual predictions.

Raises:
ValueError

If y_hat has inconsistent dimensions with X or treatments.

AssertionError

If y_hat is None.

Examples

>>> import numpy as np
>>> from odtlearn.utils.validation import check_y_hat
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> treatments = [0, 1]
>>> y_hat = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
>>> validated_y_hat = check_y_hat(X, treatments, y_hat)
>>> print(validated_y_hat)
[[0.1 0.2]
 [0.3 0.4]
 [0.5 0.6]]
odtlearn.utils.validation.check_y(X: numpy.ndarray, y: numpy.ndarray | pandas.core.series.Series) numpy.ndarray[source]

Check and validate target values (y).

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples.

yarray-like of shape (n_samples,)

The target values to be checked.

Returns:
yndarray of shape (n_samples,)

The validated and potentially converted target values.

Raises:
ValueError

If y has inconsistent number of samples with X.

Examples

>>> import numpy as np
>>> from odtlearn.utils.validation import check_y
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> y = np.array([0, 1, 0])
>>> validated_y = check_y(X, y)
>>> print(validated_y)
[0. 1. 0.]
odtlearn.utils.validation.check_columns_match(original_columns: numpy.ndarray | pandas.core.indexes.base.Index, new_data: numpy.ndarray | pandas.core.frame.DataFrame) None[source]

Check if the columns in new_data match the original_columns.

Parameters:
original_columnslist

The list of column names from the original data.

new_dataarray-like or pandas.DataFrame

The new data to be checked.

Returns:
bool

True if the columns match, False otherwise.

Raises:
ValueError

If new_data is a DataFrame and contains columns not present in original_columns.

AssertionError

If new_data is not a DataFrame and has a different number of columns than original_columns.

Notes

This function performs different checks based on whether new_data is a pandas DataFrame or not: - For DataFrames: It checks if all columns in new_data are present in original_columns. - For non-DataFrames: It checks if the number of columns matches the length of original_columns.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_columns_match
>>> original_cols = ['A', 'B', 'C']
>>> new_data = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
>>> result = check_columns_match(original_cols, new_data)
>>> print(result)
True
odtlearn.utils.validation.check_binary(df: pandas.core.frame.DataFrame | numpy.ndarray) None[source]

Check if all values in the DataFrame are binary (0 or 1).

Parameters:
dfpandas.DataFrame or array-like

The data to be checked.

Raises:
ValueError

If df is a DataFrame and contains columns with non-binary values.

AssertionError

If df is not a DataFrame and contains non-binary values.

Notes

This function performs different checks based on whether df is a pandas DataFrame or not: - For DataFrames: It identifies columns containing non-binary values. - For non-DataFrames: It checks if all values are either 0 or 1.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_binary
>>> df = pd.DataFrame({'A': [0, 1, 0], 'B': [1, 1, 0]})
>>> check_binary(df)  # This will not raise an error
>>> df['C'] = [0, 1, 2]
>>> check_binary(df)  # This will raise a ValueError
ValueError: Found columns (['C']) that contain values other than 0 or 1.
odtlearn.utils.validation.check_integer(df)[source]

Check if all values in the DataFrame are integers.

Parameters:
dfpandas.DataFrame or array-like

The data to be checked.

Raises:
ValueError

If df contains non-integer values.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_integer
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> check_integer(df)  # This will not raise an error
>>> df['C'] = [1.5, 2.0, 3.0]
>>> check_integer(df)  # This will raise a ValueError
ValueError: Found non-integer values.
odtlearn.utils.validation.check_same_as_X(X, X_col_labels, G, G_label)[source]

Check if a DataFrame G has the same structure as X.

Parameters:
Xpandas.DataFrame

The reference DataFrame.

X_col_labelsarray-like

The column labels of X.

Gpandas.DataFrame or array-like

The DataFrame or array to be checked against X.

G_labelstr

A label for G to be used in error messages.

Returns:
pandas.DataFrame

G converted to a DataFrame if it wasn’t already.

Raises:
ValueError

If G has a different number of columns than X.

KeyError

If G is a DataFrame and its columns don’t match X_col_labels.

TypeError

If G is not a DataFrame and X has non-default column labels.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_same_as_X
>>> X = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> G = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
>>> result = check_same_as_X(X, X.columns, G, 'Test DataFrame')
>>> print(result)
   A  B
0  5  7
1  6  8