GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

Global Counterfactual Methods 

C-GLANCE (Iterative Merges)

class humancompatible.explain.glance.iterative_merges.iterative_merges.C_GLANCE(model: Any, initial_clusters: int = 100, final_clusters: int = 10, num_local_counterfactuals: int = 5, heuristic_weights: Tuple[float, float] = (0.5, 0.5), alternative_merges: bool = True, random_seed: int = 13, verbose=True)[source]

Bases: GlobalCounterfactualMethod

A class for generating global counterfactual explanations using an iterative merging approach.

It allows the user to control the number of clusters and the methods used for clustering and generating counterfactuals.

Attributes:

modelAny: The predictive model used for generating counterfactuals.
initial_clustersint: The initial number of clusters to form.
final_clustersint: The target number of clusters after merging.
num_local_counterfactualsint: The number of local counterfactuals to generate for each cluster.
heuristic_weightsTuple[float, float]: Weights used in the heuristic for merging clusters.
alternative_mergesbool: If True, allows alternative merging strategies.
random_seedint: Seed for random number generation.
verbosebool: If True, enables verbose output during processing.
final_clusteringOptional[Dict[int, pd.DataFrame]]: The final clustering of instances after merging.
cluster_resultsOptional[Dict[int, Dict[str, Any]]]: Results of the clustering including effectiveness and cost metrics.

Methods:

_set_features_names(X, numerical_names, categorical_names):: Sets the feature names for numerical and categorical features.
fit(X, y, train_dataset, feat_to_vary, numeric_features_names, categorical_features_names,: clustering_method, cf_generator, cluster_action_choice_algo, …) Fits the clustering and counterfactual generation model to the provided dataset.
explain_group(instances):: Explains the group of instances by generating counterfactuals based on clustering.
global_actions():: Retrieves the global actions derived from the clustered results.

Initializes the IterativeMerges instance.

Parameters:

modelAny: The predictive model used for generating counterfactuals.
initial_clustersint, optional: The initial number of clusters to form. Default is 100.
final_clustersint, optional: The target number of clusters after merging. Default is 10.
num_local_counterfactualsint, optional: The number of local counterfactuals to generate for each cluster. Default is 5.
heuristic_weightsTuple[float, float], optional: Weights used in the heuristic for merging clusters. Default is (0.5, 0.5).
alternative_mergesbool, optional: If True, allows alternative merging strategies. Default is True.
random_seedint, optional: Seed for random number generation. Default is 13.
verbosebool, optional: If True, enables verbose output during processing. Default is True.

explain_group(instances: DataFrame) → Tuple[int, float][source]

Explains the group of instances by generating counterfactuals based on clustering.

Parameters:

instancespd.DataFrame: The group of instances to explain.

Returns:

Tuple[int, float]: A tuple containing the total effectiveness and total cost of the generated counterfactuals.

fit(X: DataFrame, y: Series, train_dataset: DataFrame, feat_to_vary: List[str] | str | None = 'all', numeric_features_names: List[str] | None = None, categorical_features_names: List[str] | None = None, clustering_method: ClusteringMethod | Literal['KMeans'] = 'KMeans', cf_generator: LocalCounterfactualMethod | Literal['Dice', 'NearestNeighbors', 'RandomSampling'] = 'Dice', cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost'] = 'max-eff', nns__n_scalars: int | None = None, rs__n_most_important: int | None = None, rs__n_categorical_most_frequent: int | None = None, lowcost__action_threshold: int | None = None, lowcost__num_low_cost: int | None = None, min_cost_eff_thres__effectiveness_threshold: float | None = None, min_cost_eff_thres_combinations__num_min_cost: int | None = None, eff_thres_hybrid__max_n_actions_full_combinations: int | None = None) → C_GLANCE[source]

Fits the clustering and counterfactual generation model to the provided dataset.

Parameters:

Xpd.DataFrame: Features of the dataset.
ypd.Series: Target variable.
train_datasetpd.DataFrame: The training dataset used for local counterfactual generation methods.
feat_to_varyOptional[Union[List[str], str]], optional: Features to vary in counterfactual generation. Default is “all”.
numeric_features_namesOptional[List[str]], optional: List of numeric feature names. If None, they will be inferred from X.
categorical_features_namesOptional[List[str]], optional: List of categorical feature names. If None, they will be inferred from X.
clustering_methodUnion[ClusteringMethod, Literal[“KMeans”]], optional: The clustering method to use. Default is “KMeans”.
cf_generatorUnion[LocalCounterfactualMethod, Literal[“Dice”, “NearestNeighbors”, “RandomSampling”]], optional: The local counterfactual generation method to use. Default is “Dice”.
cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost””], optional: The algorithm for selecting actions from clusters. Default is “max-eff”.
nns__n_scalarsOptional[int], optional: Number of scalar features to use for nearest neighbors. Default is None.
rs__n_most_importantOptional[int], optional: Number of most important features for random sampling. Default is None.
rs__n_categorical_most_frequentOptional[int], optional: Number of most frequent categorical features for random sampling. Default is None.
lowcost__action_thresholdOptional[int], optional: Action threshold for low-cost methods. Default is None.
lowcost__num_low_costOptional[int], optional: Number of low-cost actions to consider. Default is None.
min_cost_eff_thres__effectiveness_thresholdOptional[float], optional: Effectiveness threshold for minimum cost methods. Default is None.
min_cost_eff_thres_combinations__num_min_costOptional[int], optional: Number of minimum cost combinations to evaluate. Default is None.
eff_thres_hybrid__max_n_actions_full_combinationsOptional[int], optional: Maximum number of actions for full combinations in hybrid thresholding. Default is None.

Returns:

IterativeMerges: Returns the fitted instance of IterativeMerges.

global_actions()[source]

humancompatible.explain.glance.iterative_merges.iterative_merges.action_fake_cost(action: Series, numerical_features_names: List[str], categorical_features_names: List[str])[source]

humancompatible.explain.glance.iterative_merges.iterative_merges.actions_cumulative_eff_cost(model: Any, X: DataFrame, actions_with_costs: List[Tuple[Series, float]], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: Any) → Tuple[float, float][source]

Evaluates the cumulative effectiveness and cost of applying a sequence of actions to a dataset using a predictive model.

This function applies each action from the sorted list of actions with their costs, predicts the outcomes, and calculates the total number of predictions that were flipped as well as the total recourse cost incurred from the actions.

Parameters:

modelAny: A machine learning model used for making predictions on the modified instances.
Xpd.DataFrame: The original DataFrame of instances to which actions will be applied.
actions_with_costsList[Tuple[pd.Series, float]]: A list of tuples where each tuple contains: - A pandas Series representing the action to apply. - A float representing the cost associated with the action.
dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]: A function that computes the distance or cost between two DataFrames.
numerical_columnsList[str]: A list of names for the numerical columns in the DataFrame.
categorical_columnsList[str]: A list of names for the categorical columns in the DataFrame.
categorical_no_action_tokenAny: A token used to represent the absence of an action for categorical features.

Returns:

Tuple[float, float]: A tuple containing: - The total number of predictions flipped across all actions applied. - The total recourse cost incurred from applying the actions.

humancompatible.explain.glance.iterative_merges.iterative_merges.cluster_results(model: Any, instances: DataFrame, clusters: Dict[int, DataFrame], cluster_expl_actions: Dict[int, DataFrame], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_features_names: List[str], categorical_features_names: List[str], cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost', 'min-cost-eff-thres', 'eff-thres-hybrid'] = 'max-eff', action_threshold: int = 2, num_low_cost: int = 20, effectiveness_threshold: float = 0.1, num_min_cost: int | None = None, max_n_actions_full_combinations: int = 50) → Tuple[Dict[int, Dict[str, Any]], float, float][source]

Evaluates and selects actions for each cluster based on a specified action choice algorithm.

This function iterates through each cluster of instances, applying the specified algorithm to select the best action for achieving recourse while minimizing costs. It calculates the total effectiveness and mean recourse costs across all clusters.

Parameters:

modelAny: A machine learning model used for making predictions on modified instances.
instancespd.DataFrame: The DataFrame of original instances to which actions will be applied.
clustersDict[int, pd.DataFrame]: A dictionary mapping cluster IDs to DataFrames of instances belonging to each cluster.
cluster_expl_actionsDict[int, pd.DataFrame]: A dictionary mapping cluster IDs to DataFrames of candidate actions for each cluster.
dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]: A function that computes the distance or cost between two DataFrames.
numerical_features_namesList[str]: A list of names for the numerical columns in the DataFrames.
categorical_features_namesList[str]: A list of names for the categorical columns in the DataFrames.
cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost”, “min-cost-eff-thres”, “eff-thres-hybrid”]: The algorithm to use for selecting actions from candidate actions. Options include: - “max-eff”: Select the action with maximum effectiveness. - “mean-act”: Select the mean action from candidate actions. - “low-cost”: Select actions based on low cost.
action_thresholdint: Minimum threshold for the number of flipped predictions required to consider an action effective.
num_low_costint: The number of low-cost actions to consider (used when the low-cost algorithm is selected).
effectiveness_thresholdfloat: Minimum effectiveness required for actions (used when the min-cost-eff-thres algorithm is selected).
num_min_costOptional[int]: Number of minimum cost actions to consider (used when the min-cost-eff-thres algorithm is selected).
max_n_actions_full_combinationsint: Maximum number of actions to evaluate in full combinations (not currently used in the function).

Returns:

Tuple[Dict[int, Dict[str, Any]], float, float]: A tuple containing: - A dictionary where each key is a cluster ID and each value is another dictionary with the selected action, its effectiveness, and cost. - Total effectiveness percentage across all clusters. - Total mean recourse cost across all clusters.

humancompatible.explain.glance.iterative_merges.iterative_merges.cumulative(model, instances, actions, dist_func_dataframe, numeric_features_names, categorical_features_names, categorical_no_action_token)[source]

Computes the cumulative effectiveness and cost of applying a set of actions to a given set of instances using a predictive model.

Parameters:

modelAny: A predictive model with a predict method. This model will be used to predict outcomes after applying actions to the input instances.
instancespd.DataFrame: A DataFrame containing the instances for which actions are to be applied.
actionsList[dict]: A list of actions, where each action is represented as a dictionary that specifies how to modify the instances.
dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]: A distance function that takes two DataFrames and returns a Series of distances between corresponding rows.
numeric_features_namesList[str]: A list of names for the numeric features in the instances DataFrame.
categorical_features_namesList[str]: A list of names for the categorical features in the instances DataFrame.
categorical_no_action_tokenAny: A token used to represent a no-action state for categorical features.

Returns:

Tuple[int, float]

A tuple containing: - effectiveness: An integer count of how many actions were effective (i.e.,

resulted in a finite cost).

cost: A float representing the total cost incurred by the effective actions.

humancompatible.explain.glance.iterative_merges.iterative_merges.format_glance_output(cluster_stats: Dict[int, Dict[str, Number]], categorical_columns: List[str])[source]

humancompatible.explain.glance.iterative_merges.iterative_merges.print_results(clusters_stats: Dict[int, Dict[str, Number]], total_effectiveness: float, total_cost: float)[source]

Prints the statistics for each cluster, including effectiveness and cost.

This function takes the results of cluster analysis and formats them for easy viewing. It displays the size of each cluster, the actions taken, and the effectiveness and cost of those actions.

Parameters:

clusters_statsDict[int, Dict[str, numbers.Number]]

A dictionary where keys are cluster IDs (integers) and values are dictionaries containing statistics for each cluster. Each value dictionary must contain the following keys:

“size”: The size of the cluster.

“action”: The actions taken for the cluster.

“effectiveness”: The effectiveness of the actions in the cluster.

“cost”: The cost associated with the actions.

total_effectivenessfloat

The total effectiveness percentage across all clusters, represented as a decimal (e.g., 0.75 for 75%).

total_costfloat

The total cost associated with the actions taken across all clusters.

T-GLANCE (Counterfactual Tree)

class humancompatible.explain.glance.counterfactual_tree.counterfactual_tree.T_GLANCE(model: Any, split_features: List | int | None = None, partition_counterfactuals: int | None = None, child_count: int = 2, global_method: GlobalCounterfactualMethod | str | None = None, local_method: LocalCounterfactualMethod | str | None = None, num_local_counterfactuals: int = 100)[source]

Bases: object

A class to generate counterfactual explanations using a decision tree-like structure.

This class allows users to create a tree structure for counterfactual generation, optimizing effectiveness and cost based on specified features. It supports both local and global methods for generating counterfactuals.

Attributes:

modelAny: The predictive model used for generating counterfactuals.
split_featuresUnion[List, int]: Features to split the tree. Can be a list of feature names or an integer specifying the number of top features to use based on permutation importance.
partition_counterfactualsint: The number of partitions to create for counterfactuals.
child_countint: The number of children each node can have.
global_methodUnion[GlobalCounterfactualMethod, str]: The global counterfactual generation method to use.
local_methodUnion[LocalCounterfactualMethod, str]: The local counterfactual generation method to use.
num_local_counterfactualsint: The number of local counterfactuals to generate.
nodeNode: The root node of the counterfactual tree.
node_instancespd.DataFrame: The instances that were used to build the counterfactual tree.
dist_func_dataframeCallable: A distance function for calculating distances between instances.

Methods:

fit(X, y, train_dataset=None, feat_to_vary=”all”, random_seed=13, numeric_features_names=None, categorical_features_names=None):: Fits the counterfactual tree to the provided data.
_local_group_eff_cost(instances):: Calculates the effectiveness and cost of local counterfactuals for a group of instances.
_group_eff_cost(instances):: Calculates the effectiveness and cost of counterfactuals for a group of instances, utilizing local or global methods.
partition_group(instances):: Partitions the group of instances into a tree structure based on the specified features.
cumulative_leaf_actions():: Computes the total effectiveness and cost of actions taken from leaf nodes of the tree.

Initializes the CounterfactualTree instance.

Parameters:

modelAny: The predictive model to use for generating counterfactuals.
split_featuresUnion[List, int], optional: Features to split the tree. If None, uses permutation importance to select. If an integer, selects the top N features.
partition_counterfactualsint, optional: Number of partitions for counterfactual generation.
child_countint, optional: Number of children for each node in the tree. Default is 2.
global_methodUnion[GlobalCounterfactualMethod, str], optional: The global counterfactual generation method to use.
local_methodUnion[LocalCounterfactualMethod, str], optional: The local counterfactual generation method to use.
num_local_counterfactualsint, optional: Number of local counterfactuals to generate. Default is 100.

cumulative_leaf_actions()[source]

Computes the total effectiveness and cost of actions taken from leaf nodes of the tree.

Returns:

Tuple[float, float, int]: A tuple containing the total effectiveness, total cost, and the number of actions taken.

fit(X: DataFrame, y: Series, train_dataset: DataFrame | None = None, feat_to_vary: List[str] | str | None = 'all', random_seed: int = 13, numeric_features_names: List[str] | None = None, categorical_features_names: List[str] | None = None)[source]

Fits the counterfactual tree to the provided data.

Parameters:

Xpd.DataFrame: Features of the dataset.
ypd.Series: Target variable.
train_datasetOptional[pd.DataFrame], optional: The training dataset to use for local counterfactual generation methods.
feat_to_varyOptional[Union[List[str], str]], optional: Features to vary in counterfactual generation. Default is “all”.
random_seedint, optional: Random seed for reproducibility. Default is 13.
numeric_features_namesOptional[List[str]], optional: List of numeric feature names. If None, they will be inferred from X.
categorical_features_namesOptional[List[str]], optional: List of categorical feature names. If None, they will be inferred from X.

partition_group(instances: DataFrame)[source]

Partitions the group of instances into a tree structure based on the specified features.

Parameters:

instancespd.DataFrame: The group of instances to partition.

Returns:

Node: The root node of the partitioned tree.

Clustering Method Wrappers 

class humancompatible.explain.glance.clustering.kmeans.KMeansMethod(num_clusters, random_seed)[source]

Bases: ClusteringMethod

Implementation of a clustering method using KMeans.

This class provides an interface to apply KMeans clustering to a dataset.

Initializes the KMeansMethod class.

Parameters:

num_clustersint: The number of clusters to form as well as the number of centroids to generate.
random_seedint: A seed for the random number generator to ensure reproducibility.

fit(data)[source]

Fits the KMeans model on the provided dataset.

Parameters:

dataarray-like or sparse matrix, shape (n_samples, n_features): Training instances to cluster.

Returns:

None

predict(instances)[source]

Predicts the nearest cluster each sample in the provided data belongs to.

Parameters:

instancesarray-like or sparse matrix, shape (n_samples, n_features): New data to predict.

Returns:

labelsarray, shape (n_samples,): Index of the cluster each sample belongs to.

Local Counterfactual Methods 

class humancompatible.explain.glance.local_cfs.dice_method.DiceMethod[source]

Bases: LocalCounterfactualMethod

Implementation of the Dice method for generating counterfactual instances.(https://interpret.ml/DiCE/)

The Dice method uses a specified machine learning model and data to generate counterfactual examples, providing insights into how changes in feature values can influence model predictions.

Methods:

__init__():: Initializes the DiceMethod instance.
fit(model, data, outcome_name, continuous_features, feat_to_vary, random_seed=13):: Fits the DiceMethod to the provided dataset, preparing the counterfactual generator.
explain_instances(instances, num_counterfactuals):: Generates counterfactual instances for the specified input instances.

Initializes a new instance of the DiceMethod class.

Attributes:

cf_generatorNone or dice_ml.Dice: Counterfactual generator instance, initially set to None.

explain_instances(instances: DataFrame, num_counterfactuals: int) → DataFrame[source]

Generates counterfactual instances for the specified input instances.

Parameters:

instancespd.DataFrame: DataFrame containing the instances for which counterfactuals are generated.
num_counterfactualsint: The number of counterfactuals to generate for each instance.

Returns:

pd.DataFrame: A DataFrame containing the generated counterfactuals.

Raises:

ValueError: If the counterfactual generator has not been initialized (fit method not called).

fit(model, data, outcome_name, continuous_features, feat_to_vary, random_seed=13)[source]

Fits the DiceMethod to the provided dataset by creating a counterfactual generator.

Parameters:

modelobject: A machine learning model used for predictions.
datapd.DataFrame: The dataset containing features and the outcome variable.
outcome_namestr: The name of the outcome variable in the dataset.
continuous_featuresList[str]: A list of names for continuous (numerical) features.
feat_to_varyList[str]: A list of feature names that can be varied to generate counterfactuals.
random_seedint, optional: Seed for random number generation to ensure reproducibility, by default 13.

class humancompatible.explain.glance.local_cfs.nearest_neighbor.NearestNeighborMethod[source]

Bases: LocalCounterfactualMethod

NearestNeighborMethod is a local counterfactual method that finds the nearest unaffected neighbors in the training dataset to explain instances by generating counterfactuals.

This method identifies instances in the training set where the model prediction remains unaffected, and uses the nearest neighbors (based on feature similarity) to generate counterfactual explanations for new instances.

Methods:

__init__():: Initializes the NearestNeighborMethod instance.
fit(model, data, outcome_name, continuous_features, feat_to_vary, random_seed=13):: Fits the method to the training data by identifying unaffected instances based on model predictions and preparing the feature encoding for nearest neighbor searches.
explain_instances(instances, num_counterfactuals):: Finds and returns the nearest unaffected neighbors for each instance, generating the specified number of counterfactual explanations.

Initializes a new instance of the NearestNeighborMethod class.

explain_instances(instances: DataFrame, num_counterfactuals: int) → DataFrame[source]

Generates counterfactual explanations for the provided instances by finding the nearest unaffected neighbors in the training data.

Parameters:

instancespd.DataFrame: DataFrame containing the instances for which counterfactual explanations are needed.
num_counterfactualsint: The number of counterfactuals to generate for each instance.

Returns:

pd.DataFrame: A DataFrame containing the nearest unaffected neighbors (counterfactuals) for each instance.

Notes:

If the requested number of counterfactuals exceeds the number of available unaffected instances, a warning is raised, and all unaffected instances are used.
Nearest neighbors are determined using a one-hot encoded feature representation.

fit(model, data: DataFrame, outcome_name: str, continuous_features: List[str], feat_to_vary: List[str], random_seed=13)[source]

Fits the NearestNeighborMethod by identifying unaffected instances in the training dataset and preparing feature encodings for counterfactual search.

Parameters:

modelobject: A machine learning model with a predict method that outputs binary predictions (0 or 1).
datapd.DataFrame: A dataset containing the features and outcome variable used for fitting the method.
outcome_namestr: The name of the outcome column in the dataset.
continuous_featuresList[str]: A list of continuous (numerical) feature column names.
feat_to_varyList[str]: A list of features allowed to vary when generating counterfactuals.
random_seedint, optional: Seed for random number generation to ensure reproducibility, by default 13.

class humancompatible.explain.glance.local_cfs.random_sampling.RandomSampling(model, n_most_important, n_categorical_most_frequent, numerical_features, categorical_features, random_state=None)[source]

Bases: LocalCounterfactualMethod

RandomSampling is a local counterfactual method that generates counterfactual instances through random sampling based on the distribution of features in the unaffected training data.

This method identifies the most important features and the most frequent categories within the unaffected training data to generate counterfactuals by sampling from these distributions.

Methods:

__init__(model, n_most_important, n_categorical_most_frequent, numerical_features, categorical_features, random_state=None):: Initializes the RandomSampling instance with the specified parameters.
fit(X, y):: Fits the RandomSampling method to the provided training data by calculating feature importances and identifying unaffected instances.
_sample_instances(n_samples, fixed_feature_values, random_state=None):: Samples instances based on the specified feature distributions, fixing certain feature values while sampling others.
explain(instance, num_counterfactuals, n_samples=1000, random_state=None):: Generates counterfactual explanations for a given instance by sampling and modifying feature values.
explain_instances(instances, num_counterfactuals, n_samples=1000, random_state=None):: Generates counterfactuals for multiple instances by calling the explain method for each instance.

Initializes a new instance of the RandomSampling class.

Parameters:

modelobject: A machine learning model used for predictions and feature importance evaluation.
n_most_importantint: The number of most important features to consider when generating counterfactuals.
n_categorical_most_frequentint: The number of most frequent categories to consider for categorical features.
numerical_featuresList[str]: A list of continuous (numerical) feature names.
categorical_featuresList[str]: A list of categorical feature names.
random_stateint, optional: Seed for random number generation to ensure reproducibility, by default None.

explain(instance, num_counterfactuals, n_samples=1000, random_state=None)[source]

Generates counterfactual explanations for a given instance by sampling and modifying feature values.

Parameters:

instancepd.DataFrame: A single row DataFrame representing the instance for which counterfactuals are generated.
num_counterfactualsint: The number of counterfactuals to generate.
n_samplesint, optional: The number of samples to draw for generating counterfactuals, by default 1000.
random_stateint, optional: Seed for random number generation, by default None.

Returns:

pd.DataFrame: A DataFrame containing the generated counterfactuals for the provided instance.

Raises:

ValueError: If the input instance is not a single-row DataFrame or if its columns do not match the training dataset’s columns.

explain_instances(instances: DataFrame, num_counterfactuals: int, n_samples=1000, random_state=None) → DataFrame[source]

Generates counterfactuals for multiple instances by calling the explain method for each instance.

Parameters:

instancespd.DataFrame: DataFrame containing instances for which counterfactual explanations are needed.
num_counterfactualsint: The number of counterfactuals to generate for each instance.
n_samplesint, optional: The number of samples to draw for generating counterfactuals, by default 1000.
random_stateint, optional: Seed for random number generation, by default None.

Returns:

pd.DataFrame: A DataFrame containing the generated counterfactuals for all provided instances.

fit(X: DataFrame, y: Series)[source]

Fits the RandomSampling method to the provided training data by calculating feature importances and identifying unaffected instances.

Parameters:

Xpd.DataFrame: The training dataset containing feature columns.
ypd.Series: The target variable corresponding to the training dataset.

Returns:

selfRandomSampling: Returns the fitted instance of RandomSampling.

Utility Functions 

humancompatible.explain.glance.utils.action.actions_mean_pandas(actions: DataFrame, numerical_features: List[str], categorical_features: List[str], categorical_no_action_token: Any) → Series[source]

Computes the mean action for numerical features and the most frequent action for categorical features from a given actions DataFrame.

For numerical features, the function calculates the mean of the actions across all instances. For categorical features, it determines the most frequent value in the actions DataFrame, unless all values are equal to the categorical_no_action_token, in which case the token is returned.

Parameters:

actionspd.DataFrame: A DataFrame where each row represents an instance, and each column represents an action for a feature (either numerical or categorical).
numerical_featuresList[str]: List of columns in actions that are numerical features.
categorical_featuresList[str]: List of columns in actions that are categorical features.
categorical_no_action_tokenAny: A token or value that indicates no action is needed for categorical features.

Returns:

pd.Series: A Series where: - For numerical features, the values are the mean of the actions for each numerical column. - For categorical features, the values are the most frequent action in each categorical column, or the categorical_no_action_token if no action was needed.

humancompatible.explain.glance.utils.action.apply_action_numpy(X: ndarray[Any, dtype[number]], action: ndarray[Any, dtype[number]], numerical_columns: List[int], categorical_columns: List[int], categorical_no_action_token: number) → ndarray[Any, dtype[number]][source]

Apply action to all rows of X. For numerical columns, add the respective component from action. For categorical columns, set the component of all rows to the value of action, unless it is equal to the categorical_no_action_token, in which case do nothing for this column.

Note: input array should have a numeric dtype. Thus, categorical columns should be encoded by numbers (e.g. Ordinal Encoding).

Parameters:

X (npt.NDArray[np.number]) – matrix of observations
action (npt.NDArray[np.number]) – for each column / feature, the action to be applied
numerical_columns (List[int]) – numerical column indices
categorical_columns (List[int]) – categorical column indices
categorical_no_action_token (np.number) – special value signifying no-action (i.e. equivalent to 0 for numerical columns)

Returns:

new observations resulting from the action application.

Return type:

npt.NDArray[np.number]

humancompatible.explain.glance.utils.action.apply_action_pandas(X: DataFrame, action: Series, numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: Any, numerical_no_action_token: Any | None = None) → DataFrame[source]

Apply action to all rows of X. For numerical columns, add the respective component from action. For categorical columns, set the component of all rows to the value of action, unless it is equal to the categorical_no_action_token, in which case do nothing for this column.

Parameters:

X (pd.DataFrame) – matrix of observations
action (pd.Series) – for each column / feature, the action to be applied
numerical_columns (List[str]) – numerical column names
categorical_columns (List[str]) – categorical column names
categorical_no_action_token (Any) – special value signifying no-action (i.e. equivalent to 0 for numerical columns)

Returns:

new observations resulting from the action application.

Return type:

pd.DataFrame

humancompatible.explain.glance.utils.action.apply_actions_pandas_rows(X: DataFrame, actions: DataFrame, numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: object) → DataFrame[source]

Applies a set of actions to transform the original dataset X based on the actions specified in the actions DataFrame.

For numerical columns, the function adds the values from the actions DataFrame to the corresponding columns in X. For categorical columns, if the action for a column is not equal to the categorical_no_action_token, the value from the actions DataFrame is used to update X. Otherwise, the original value from X is retained.

Parameters:

Xpd.DataFrame: The original dataset, where each row represents an instance, and each column is a feature.
actionspd.DataFrame: A DataFrame of the same shape as X, containing the actions to apply to each feature. - For numerical columns: contains the values to add to the corresponding features in X. - For categorical columns: contains either the new value to apply or the categorical_no_action_token.
numerical_columnsList[str]: List of columns in X and actions that are numerical.
categorical_columnsList[str]: List of columns in X and actions that are categorical.
categorical_no_action_tokenobject: A token or value indicating that no action should be taken for a categorical feature.

Returns:

pd.DataFrame: A DataFrame of the same shape as X where the actions have been applied: - For numerical columns: each value is updated by adding the corresponding action from actions. - For categorical columns: updated values from actions are used where applicable; otherwise, the original values from X are retained.

humancompatible.explain.glance.utils.action.extract_actions_pandas(X: DataFrame, cfs: DataFrame, categorical_features: List[str], numerical_features: List[str], categorical_no_action_token: Any)[source]

Extracts the actions needed to convert the original dataset X into the counterfactual dataset cfs.

For categorical features, the function identifies changes between X and cfs. If no change is observed in a categorical feature, a specified categorical_no_action_token is used to denote that no action is needed. For numerical features, the function computes the difference between the counterfactual and the original values.

Parameters:

Xpd.DataFrame: The original dataset, where each row represents an instance, and each column is a feature.
cfspd.DataFrame: The counterfactual dataset, which has the same structure as X. It represents the desired state after some action is applied.
categorical_featuresList[str]: List of columns in X and cfs that are categorical.
numerical_featuresList[str]: List of columns in X and cfs that are numerical.
categorical_no_action_tokenAny: A token or value to insert into categorical features where no change is needed (i.e., the feature value in X is the same as in cfs).

Returns:

pd.DataFrame: A DataFrame of the same shape as X and cfs where each value indicates the action required to transform X into cfs: - For categorical features: the value in cfs if it differs from X, otherwise categorical_no_action_token. - For numerical features: the difference between cfs and X.

humancompatible.explain.glance.utils.centroid.centroid_numpy(X: ndarray[Any, dtype[number]], numerical_columns: List[int], categorical_columns: List[int]) → ndarray[Any, dtype[number]][source]

Calculates the centroid of the rows of a 2d numy array. Specifically, for the numerical_columns columns, the centroid has value the mean of all rows, while for the categorical_columns columns, the centroid has value the mode of all rows.

Parameters:

X (npt.NDArray[np.number]) – matrix of observations
numerical_columns (List[int]) – numerical column indices
categorical_columns (List[int]) – categorical column indices

Returns:

2d numpy array whose single row is the centroid

Return type:

npt.NDArray[np.number]

humancompatible.explain.glance.utils.centroid.centroid_pandas(X: DataFrame, numerical_columns: List[str], categorical_columns: List[str]) → DataFrame[source]

Calculates the centroid of the rows of a pandas DataFrame. Specifically, for the numerical_columns columns, the centroid has value the mean of all rows, while for the categorical_columns columns, the centroid has value the mode of all rows.

Parameters:

X (pd.DataFrame) – matrix of observations
numerical_columns (List[str]) – numerical column names
categorical_columns (List[str]) – categorical column names

Returns:

DataFrame whose single row is the centroid

Return type:

pd.DataFrame

GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

Global Counterfactual Methods

C-GLANCE (Iterative Merges)

Attributes:

Methods:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

T-GLANCE (Counterfactual Tree)

Attributes:

Methods:

Parameters:

Returns:

Parameters:

Parameters:

Returns:

Clustering Method Wrappers

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Local Counterfactual Methods

Methods:

Attributes:

Parameters:

Returns:

Raises:

Parameters:

Methods:

Parameters:

Returns:

Notes:

Parameters:

Methods:

Parameters:

Parameters:

Returns:

Raises:

Parameters:

Returns:

Parameters:

Returns:

Utility Functions

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Global Counterfactual Methods 

Clustering Method Wrappers 

Local Counterfactual Methods 

Utility Functions 