odtlearn.utils.validation¶

Functions¶

`check_ipw`(→ numpy.ndarray)	Check and validate inverse probability weights (IPW).
`check_y_hat`(→ numpy.ndarray)	Check and validate counterfactual predictions (y_hat).
`check_y`(→ numpy.ndarray)	Check and validate target values (y).
`check_columns_match`(→ None)	Check if the columns in new_data match the original_columns.
`check_binary`(→ None)	Check if all values in the DataFrame are binary (0 or 1).
`check_integer`(df)	Check if all values in the DataFrame are integers.
`check_same_as_X`(X, X_col_labels, G, G_label)	Check if a DataFrame G has the same structure as X.

Module Contents¶

odtlearn.utils.validation.check_ipw(X: numpy.ndarray, ipw: numpy.ndarray | pandas.core.series.Series) → numpy.ndarray[source]¶

Check and validate inverse probability weights (IPW).

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.
ipwarray-like of shape (n_samples,): The inverse probability weights to be checked.

Returns:

ipwndarray of shape (n_samples,): The validated and potentially converted inverse probability weights.

Raises:

ValueError: If ipw has inconsistent number of samples with X.
AssertionError: If any value in ipw is not in the range (0, 1].

Examples

>>> import numpy as np
>>> from odtlearn.utils.validation import check_ipw
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> ipw = np.array([0.5, 0.7, 0.3])
>>> validated_ipw = check_ipw(X, ipw)
>>> print(validated_ipw)
[0.5 0.7 0.3]

odtlearn.utils.validation.check_y_hat(X: numpy.ndarray, treatments: numpy.ndarray, y_hat: pandas.core.frame.DataFrame | numpy.ndarray) → numpy.ndarray[source]¶

Check and validate counterfactual predictions (y_hat).

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.
treatmentsarray-like: The unique treatment values.
y_hatarray-like of shape (n_samples, n_treatments): The counterfactual predictions to be checked.

Returns:

y_hatndarray of shape (n_samples, n_treatments): The validated and potentially converted counterfactual predictions.

Raises:

ValueError: If y_hat has inconsistent dimensions with X or treatments.
AssertionError: If y_hat is None.

Examples

>>> import numpy as np
>>> from odtlearn.utils.validation import check_y_hat
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> treatments = [0, 1]
>>> y_hat = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
>>> validated_y_hat = check_y_hat(X, treatments, y_hat)
>>> print(validated_y_hat)
[[0.1 0.2]
 [0.3 0.4]
 [0.5 0.6]]

odtlearn.utils.validation.check_y(X: numpy.ndarray, y: numpy.ndarray | pandas.core.series.Series) → numpy.ndarray[source]¶

Check and validate target values (y).

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.
yarray-like of shape (n_samples,): The target values to be checked.

Returns:

yndarray of shape (n_samples,): The validated and potentially converted target values.

Raises:

ValueError: If y has inconsistent number of samples with X.

Examples

>>> import numpy as np
>>> from odtlearn.utils.validation import check_y
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> y = np.array([0, 1, 0])
>>> validated_y = check_y(X, y)
>>> print(validated_y)
[0. 1. 0.]

odtlearn.utils.validation.check_columns_match(original_columns: numpy.ndarray | pandas.core.indexes.base.Index, new_data: numpy.ndarray | pandas.core.frame.DataFrame) → None[source]¶

Check if the columns in new_data match the original_columns.

Parameters:

original_columnslist: The list of column names from the original data.
new_dataarray-like or pandas.DataFrame: The new data to be checked.

Returns:

bool: True if the columns match, False otherwise.

Raises:

ValueError: If new_data is a DataFrame and contains columns not present in original_columns.
AssertionError: If new_data is not a DataFrame and has a different number of columns than original_columns.

Notes

This function performs different checks based on whether new_data is a pandas DataFrame or not: - For DataFrames: It checks if all columns in new_data are present in original_columns. - For non-DataFrames: It checks if the number of columns matches the length of original_columns.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_columns_match
>>> original_cols = ['A', 'B', 'C']
>>> new_data = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
>>> result = check_columns_match(original_cols, new_data)
>>> print(result)
True

odtlearn.utils.validation.check_binary(df: pandas.core.frame.DataFrame | numpy.ndarray) → None[source]¶

Check if all values in the DataFrame are binary (0 or 1).

Parameters:

dfpandas.DataFrame or array-like: The data to be checked.

Raises:

ValueError: If df is a DataFrame and contains columns with non-binary values.
AssertionError: If df is not a DataFrame and contains non-binary values.

Notes

This function performs different checks based on whether df is a pandas DataFrame or not: - For DataFrames: It identifies columns containing non-binary values. - For non-DataFrames: It checks if all values are either 0 or 1.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_binary
>>> df = pd.DataFrame({'A': [0, 1, 0], 'B': [1, 1, 0]})
>>> check_binary(df)  # This will not raise an error
>>> df['C'] = [0, 1, 2]
>>> check_binary(df)  # This will raise a ValueError
ValueError: Found columns (['C']) that contain values other than 0 or 1.

odtlearn.utils.validation.check_integer(df)[source]¶

Check if all values in the DataFrame are integers.

Parameters:

dfpandas.DataFrame or array-like: The data to be checked.

Raises:

ValueError: If df contains non-integer values.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_integer
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> check_integer(df)  # This will not raise an error
>>> df['C'] = [1.5, 2.0, 3.0]
>>> check_integer(df)  # This will raise a ValueError
ValueError: Found non-integer values.

odtlearn.utils.validation.check_same_as_X(X, X_col_labels, G, G_label)[source]¶

Check if a DataFrame G has the same structure as X.

Parameters:

Xpandas.DataFrame: The reference DataFrame.
X_col_labelsarray-like: The column labels of X.
Gpandas.DataFrame or array-like: The DataFrame or array to be checked against X.
G_labelstr: A label for G to be used in error messages.

Returns:

pandas.DataFrame: G converted to a DataFrame if it wasn’t already.

Raises:

ValueError: If G has a different number of columns than X.
KeyError: If G is a DataFrame and its columns don’t match X_col_labels.
TypeError: If G is not a DataFrame and X has non-default column labels.

Examples

>>> import pandas as pd
>>> from odtlearn.utils.validation import check_same_as_X
>>> X = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> G = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
>>> result = check_same_as_X(X, X.columns, G, 'Test DataFrame')
>>> print(result)
   A  B
0  5  7
1  6  8