GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

Global Counterfactual Methods

C-GLANCE (Iterative Merges)

class humancompatible.explain.glance.iterative_merges.iterative_merges.C_GLANCE(model: Any, initial_clusters: int = 100, final_clusters: int = 10, num_local_counterfactuals: int = 5, heuristic_weights: Tuple[float, float] = (0.5, 0.5), alternative_merges: bool = True, random_seed: int = 13, verbose=True)[source]

Bases: GlobalCounterfactualMethod

A class for generating global counterfactual explanations using an iterative merging approach.

It allows the user to control the number of clusters and the methods used for clustering and generating counterfactuals.

Attributes:

modelAny

The predictive model used for generating counterfactuals.

initial_clustersint

The initial number of clusters to form.

final_clustersint

The target number of clusters after merging.

num_local_counterfactualsint

The number of local counterfactuals to generate for each cluster.

heuristic_weightsTuple[float, float]

Weights used in the heuristic for merging clusters.

alternative_mergesbool

If True, allows alternative merging strategies.

random_seedint

Seed for random number generation.

verbosebool

If True, enables verbose output during processing.

final_clusteringOptional[Dict[int, pd.DataFrame]]

The final clustering of instances after merging.

cluster_resultsOptional[Dict[int, Dict[str, Any]]]

Results of the clustering including effectiveness and cost metrics.

Methods:

_set_features_names(X, numerical_names, categorical_names):

Sets the feature names for numerical and categorical features.

fit(X, y, train_dataset, feat_to_vary, numeric_features_names, categorical_features_names,

clustering_method, cf_generator, cluster_action_choice_algo, …) Fits the clustering and counterfactual generation model to the provided dataset.

explain_group(instances):

Explains the group of instances by generating counterfactuals based on clustering.

global_actions():

Retrieves the global actions derived from the clustered results.

Initializes the IterativeMerges instance.

Parameters:

modelAny

The predictive model used for generating counterfactuals.

initial_clustersint, optional

The initial number of clusters to form. Default is 100.

final_clustersint, optional

The target number of clusters after merging. Default is 10.

num_local_counterfactualsint, optional

The number of local counterfactuals to generate for each cluster. Default is 5.

heuristic_weightsTuple[float, float], optional

Weights used in the heuristic for merging clusters. Default is (0.5, 0.5).

alternative_mergesbool, optional

If True, allows alternative merging strategies. Default is True.

random_seedint, optional

Seed for random number generation. Default is 13.

verbosebool, optional

If True, enables verbose output during processing. Default is True.

explain_group(instances: DataFrame) Tuple[int, float][source]

Explains the group of instances by generating counterfactuals based on clustering.

Parameters:
instancespd.DataFrame

The group of instances to explain.

Returns:
Tuple[int, float]

A tuple containing the total effectiveness and total cost of the generated counterfactuals.

fit(X: DataFrame, y: Series, train_dataset: DataFrame, feat_to_vary: List[str] | str | None = 'all', numeric_features_names: List[str] | None = None, categorical_features_names: List[str] | None = None, clustering_method: ClusteringMethod | Literal['KMeans'] = 'KMeans', cf_generator: LocalCounterfactualMethod | Literal['Dice', 'NearestNeighbors', 'RandomSampling'] = 'Dice', cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost'] = 'max-eff', nns__n_scalars: int | None = None, rs__n_most_important: int | None = None, rs__n_categorical_most_frequent: int | None = None, lowcost__action_threshold: int | None = None, lowcost__num_low_cost: int | None = None, min_cost_eff_thres__effectiveness_threshold: float | None = None, min_cost_eff_thres_combinations__num_min_cost: int | None = None, eff_thres_hybrid__max_n_actions_full_combinations: int | None = None) C_GLANCE[source]

Fits the clustering and counterfactual generation model to the provided dataset.

Parameters:
Xpd.DataFrame

Features of the dataset.

ypd.Series

Target variable.

train_datasetpd.DataFrame

The training dataset used for local counterfactual generation methods.

feat_to_varyOptional[Union[List[str], str]], optional

Features to vary in counterfactual generation. Default is “all”.

numeric_features_namesOptional[List[str]], optional

List of numeric feature names. If None, they will be inferred from X.

categorical_features_namesOptional[List[str]], optional

List of categorical feature names. If None, they will be inferred from X.

clustering_methodUnion[ClusteringMethod, Literal[“KMeans”]], optional

The clustering method to use. Default is “KMeans”.

cf_generatorUnion[LocalCounterfactualMethod, Literal[“Dice”, “NearestNeighbors”, “RandomSampling”]], optional

The local counterfactual generation method to use. Default is “Dice”.

cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost””], optional

The algorithm for selecting actions from clusters. Default is “max-eff”.

nns__n_scalarsOptional[int], optional

Number of scalar features to use for nearest neighbors. Default is None.

rs__n_most_importantOptional[int], optional

Number of most important features for random sampling. Default is None.

rs__n_categorical_most_frequentOptional[int], optional

Number of most frequent categorical features for random sampling. Default is None.

lowcost__action_thresholdOptional[int], optional

Action threshold for low-cost methods. Default is None.

lowcost__num_low_costOptional[int], optional

Number of low-cost actions to consider. Default is None.

min_cost_eff_thres__effectiveness_thresholdOptional[float], optional

Effectiveness threshold for minimum cost methods. Default is None.

min_cost_eff_thres_combinations__num_min_costOptional[int], optional

Number of minimum cost combinations to evaluate. Default is None.

eff_thres_hybrid__max_n_actions_full_combinationsOptional[int], optional

Maximum number of actions for full combinations in hybrid thresholding. Default is None.

Returns:
IterativeMerges

Returns the fitted instance of IterativeMerges.

global_actions()[source]
humancompatible.explain.glance.iterative_merges.iterative_merges.action_fake_cost(action: Series, numerical_features_names: List[str], categorical_features_names: List[str])[source]
humancompatible.explain.glance.iterative_merges.iterative_merges.actions_cumulative_eff_cost(model: Any, X: DataFrame, actions_with_costs: List[Tuple[Series, float]], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: Any) Tuple[float, float][source]

Evaluates the cumulative effectiveness and cost of applying a sequence of actions to a dataset using a predictive model.

This function applies each action from the sorted list of actions with their costs, predicts the outcomes, and calculates the total number of predictions that were flipped as well as the total recourse cost incurred from the actions.

Parameters:

modelAny

A machine learning model used for making predictions on the modified instances.

Xpd.DataFrame

The original DataFrame of instances to which actions will be applied.

actions_with_costsList[Tuple[pd.Series, float]]

A list of tuples where each tuple contains: - A pandas Series representing the action to apply. - A float representing the cost associated with the action.

dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]

A function that computes the distance or cost between two DataFrames.

numerical_columnsList[str]

A list of names for the numerical columns in the DataFrame.

categorical_columnsList[str]

A list of names for the categorical columns in the DataFrame.

categorical_no_action_tokenAny

A token used to represent the absence of an action for categorical features.

Returns:

Tuple[float, float]

A tuple containing: - The total number of predictions flipped across all actions applied. - The total recourse cost incurred from applying the actions.

humancompatible.explain.glance.iterative_merges.iterative_merges.cluster_results(model: Any, instances: DataFrame, clusters: Dict[int, DataFrame], cluster_expl_actions: Dict[int, DataFrame], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_features_names: List[str], categorical_features_names: List[str], cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost', 'min-cost-eff-thres', 'eff-thres-hybrid'] = 'max-eff', action_threshold: int = 2, num_low_cost: int = 20, effectiveness_threshold: float = 0.1, num_min_cost: int | None = None, max_n_actions_full_combinations: int = 50) Tuple[Dict[int, Dict[str, Any]], float, float][source]

Evaluates and selects actions for each cluster based on a specified action choice algorithm.

This function iterates through each cluster of instances, applying the specified algorithm to select the best action for achieving recourse while minimizing costs. It calculates the total effectiveness and mean recourse costs across all clusters.

Parameters:

modelAny

A machine learning model used for making predictions on modified instances.

instancespd.DataFrame

The DataFrame of original instances to which actions will be applied.

clustersDict[int, pd.DataFrame]

A dictionary mapping cluster IDs to DataFrames of instances belonging to each cluster.

cluster_expl_actionsDict[int, pd.DataFrame]

A dictionary mapping cluster IDs to DataFrames of candidate actions for each cluster.

dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]

A function that computes the distance or cost between two DataFrames.

numerical_features_namesList[str]

A list of names for the numerical columns in the DataFrames.

categorical_features_namesList[str]

A list of names for the categorical columns in the DataFrames.

cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost”, “min-cost-eff-thres”, “eff-thres-hybrid”]

The algorithm to use for selecting actions from candidate actions. Options include: - “max-eff”: Select the action with maximum effectiveness. - “mean-act”: Select the mean action from candidate actions. - “low-cost”: Select actions based on low cost.

action_thresholdint

Minimum threshold for the number of flipped predictions required to consider an action effective.

num_low_costint

The number of low-cost actions to consider (used when the low-cost algorithm is selected).

effectiveness_thresholdfloat

Minimum effectiveness required for actions (used when the min-cost-eff-thres algorithm is selected).

num_min_costOptional[int]

Number of minimum cost actions to consider (used when the min-cost-eff-thres algorithm is selected).

max_n_actions_full_combinationsint

Maximum number of actions to evaluate in full combinations (not currently used in the function).

Returns:

Tuple[Dict[int, Dict[str, Any]], float, float]

A tuple containing: - A dictionary where each key is a cluster ID and each value is another dictionary with the selected action, its effectiveness, and cost. - Total effectiveness percentage across all clusters. - Total mean recourse cost across all clusters.

humancompatible.explain.glance.iterative_merges.iterative_merges.cumulative(model, instances, actions, dist_func_dataframe, numeric_features_names, categorical_features_names, categorical_no_action_token)[source]

Computes the cumulative effectiveness and cost of applying a set of actions to a given set of instances using a predictive model.

Parameters:

modelAny

A predictive model with a predict method. This model will be used to predict outcomes after applying actions to the input instances.

instancespd.DataFrame

A DataFrame containing the instances for which actions are to be applied.

actionsList[dict]

A list of actions, where each action is represented as a dictionary that specifies how to modify the instances.

dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]

A distance function that takes two DataFrames and returns a Series of distances between corresponding rows.

numeric_features_namesList[str]

A list of names for the numeric features in the instances DataFrame.

categorical_features_namesList[str]

A list of names for the categorical features in the instances DataFrame.

categorical_no_action_tokenAny

A token used to represent a no-action state for categorical features.

Returns:

Tuple[int, float]

A tuple containing: - effectiveness: An integer count of how many actions were effective (i.e.,

resulted in a finite cost).

  • cost: A float representing the total cost incurred by the effective actions.

humancompatible.explain.glance.iterative_merges.iterative_merges.format_glance_output(cluster_stats: Dict[int, Dict[str, Number]], categorical_columns: List[str])[source]
humancompatible.explain.glance.iterative_merges.iterative_merges.print_results(clusters_stats: Dict[int, Dict[str, Number]], total_effectiveness: float, total_cost: float)[source]

Prints the statistics for each cluster, including effectiveness and cost.

This function takes the results of cluster analysis and formats them for easy viewing. It displays the size of each cluster, the actions taken, and the effectiveness and cost of those actions.

Parameters:

clusters_statsDict[int, Dict[str, numbers.Number]]

A dictionary where keys are cluster IDs (integers) and values are dictionaries containing statistics for each cluster. Each value dictionary must contain the following keys:

  • “size”: The size of the cluster.

  • “action”: The actions taken for the cluster.

  • “effectiveness”: The effectiveness of the actions in the cluster.

  • “cost”: The cost associated with the actions.

total_effectivenessfloat

The total effectiveness percentage across all clusters, represented as a decimal (e.g., 0.75 for 75%).

total_costfloat

The total cost associated with the actions taken across all clusters.

T-GLANCE (Counterfactual Tree)

class humancompatible.explain.glance.counterfactual_tree.counterfactual_tree.T_GLANCE(model: Any, split_features: List | int | None = None, partition_counterfactuals: int | None = None, child_count: int = 2, global_method: GlobalCounterfactualMethod | str | None = None, local_method: LocalCounterfactualMethod | str | None = None, num_local_counterfactuals: int = 100)[source]

Bases: object

A class to generate counterfactual explanations using a decision tree-like structure.

This class allows users to create a tree structure for counterfactual generation, optimizing effectiveness and cost based on specified features. It supports both local and global methods for generating counterfactuals.

Attributes:

modelAny

The predictive model used for generating counterfactuals.

split_featuresUnion[List, int]

Features to split the tree. Can be a list of feature names or an integer specifying the number of top features to use based on permutation importance.

partition_counterfactualsint

The number of partitions to create for counterfactuals.

child_countint

The number of children each node can have.

global_methodUnion[GlobalCounterfactualMethod, str]

The global counterfactual generation method to use.

local_methodUnion[LocalCounterfactualMethod, str]

The local counterfactual generation method to use.

num_local_counterfactualsint

The number of local counterfactuals to generate.

nodeNode

The root node of the counterfactual tree.

node_instancespd.DataFrame

The instances that were used to build the counterfactual tree.

dist_func_dataframeCallable

A distance function for calculating distances between instances.

Methods:

fit(X, y, train_dataset=None, feat_to_vary=”all”, random_seed=13, numeric_features_names=None, categorical_features_names=None):

Fits the counterfactual tree to the provided data.

_local_group_eff_cost(instances):

Calculates the effectiveness and cost of local counterfactuals for a group of instances.

_group_eff_cost(instances):

Calculates the effectiveness and cost of counterfactuals for a group of instances, utilizing local or global methods.

partition_group(instances):

Partitions the group of instances into a tree structure based on the specified features.

cumulative_leaf_actions():

Computes the total effectiveness and cost of actions taken from leaf nodes of the tree.

Initializes the CounterfactualTree instance.

Parameters:

modelAny

The predictive model to use for generating counterfactuals.

split_featuresUnion[List, int], optional

Features to split the tree. If None, uses permutation importance to select. If an integer, selects the top N features.

partition_counterfactualsint, optional

Number of partitions for counterfactual generation.

child_countint, optional

Number of children for each node in the tree. Default is 2.

global_methodUnion[GlobalCounterfactualMethod, str], optional

The global counterfactual generation method to use.

local_methodUnion[LocalCounterfactualMethod, str], optional

The local counterfactual generation method to use.

num_local_counterfactualsint, optional

Number of local counterfactuals to generate. Default is 100.

cumulative_leaf_actions()[source]

Computes the total effectiveness and cost of actions taken from leaf nodes of the tree.

Returns:
Tuple[float, float, int]

A tuple containing the total effectiveness, total cost, and the number of actions taken.

fit(X: DataFrame, y: Series, train_dataset: DataFrame | None = None, feat_to_vary: List[str] | str | None = 'all', random_seed: int = 13, numeric_features_names: List[str] | None = None, categorical_features_names: List[str] | None = None)[source]

Fits the counterfactual tree to the provided data.

Parameters:
Xpd.DataFrame

Features of the dataset.

ypd.Series

Target variable.

train_datasetOptional[pd.DataFrame], optional

The training dataset to use for local counterfactual generation methods.

feat_to_varyOptional[Union[List[str], str]], optional

Features to vary in counterfactual generation. Default is “all”.

random_seedint, optional

Random seed for reproducibility. Default is 13.

numeric_features_namesOptional[List[str]], optional

List of numeric feature names. If None, they will be inferred from X.

categorical_features_namesOptional[List[str]], optional

List of categorical feature names. If None, they will be inferred from X.

partition_group(instances: DataFrame)[source]

Partitions the group of instances into a tree structure based on the specified features.

Parameters:
instancespd.DataFrame

The group of instances to partition.

Returns:
Node

The root node of the partitioned tree.

Clustering Method Wrappers

class humancompatible.explain.glance.clustering.kmeans.KMeansMethod(num_clusters, random_seed)[source]

Bases: ClusteringMethod

Implementation of a clustering method using KMeans.

This class provides an interface to apply KMeans clustering to a dataset.

Initializes the KMeansMethod class.

Parameters:

num_clustersint

The number of clusters to form as well as the number of centroids to generate.

random_seedint

A seed for the random number generator to ensure reproducibility.

fit(data)[source]

Fits the KMeans model on the provided dataset.

Parameters:

dataarray-like or sparse matrix, shape (n_samples, n_features)

Training instances to cluster.

Returns:

None

predict(instances)[source]

Predicts the nearest cluster each sample in the provided data belongs to.

Parameters:

instancesarray-like or sparse matrix, shape (n_samples, n_features)

New data to predict.

Returns:

labelsarray, shape (n_samples,)

Index of the cluster each sample belongs to.

Local Counterfactual Methods

class humancompatible.explain.glance.local_cfs.dice_method.DiceMethod[source]

Bases: LocalCounterfactualMethod

Implementation of the Dice method for generating counterfactual instances.(https://interpret.ml/DiCE/)

The Dice method uses a specified machine learning model and data to generate counterfactual examples, providing insights into how changes in feature values can influence model predictions.

Methods:

__init__():

Initializes the DiceMethod instance.

fit(model, data, outcome_name, continuous_features, feat_to_vary, random_seed=13):

Fits the DiceMethod to the provided dataset, preparing the counterfactual generator.

explain_instances(instances, num_counterfactuals):

Generates counterfactual instances for the specified input instances.

Initializes a new instance of the DiceMethod class.

Attributes:

cf_generatorNone or dice_ml.Dice

Counterfactual generator instance, initially set to None.

explain_instances(instances: DataFrame, num_counterfactuals: int) DataFrame[source]

Generates counterfactual instances for the specified input instances.

Parameters:

instancespd.DataFrame

DataFrame containing the instances for which counterfactuals are generated.

num_counterfactualsint

The number of counterfactuals to generate for each instance.

Returns:

pd.DataFrame

A DataFrame containing the generated counterfactuals.

Raises:

ValueError

If the counterfactual generator has not been initialized (fit method not called).

fit(model, data, outcome_name, continuous_features, feat_to_vary, random_seed=13)[source]

Fits the DiceMethod to the provided dataset by creating a counterfactual generator.

Parameters:

modelobject

A machine learning model used for predictions.

datapd.DataFrame

The dataset containing features and the outcome variable.

outcome_namestr

The name of the outcome variable in the dataset.

continuous_featuresList[str]

A list of names for continuous (numerical) features.

feat_to_varyList[str]

A list of feature names that can be varied to generate counterfactuals.

random_seedint, optional

Seed for random number generation to ensure reproducibility, by default 13.

class humancompatible.explain.glance.local_cfs.nearest_neighbor.NearestNeighborMethod[source]

Bases: LocalCounterfactualMethod

NearestNeighborMethod is a local counterfactual method that finds the nearest unaffected neighbors in the training dataset to explain instances by generating counterfactuals.

This method identifies instances in the training set where the model prediction remains unaffected, and uses the nearest neighbors (based on feature similarity) to generate counterfactual explanations for new instances.

Methods:

__init__():

Initializes the NearestNeighborMethod instance.

fit(model, data, outcome_name, continuous_features, feat_to_vary, random_seed=13):

Fits the method to the training data by identifying unaffected instances based on model predictions and preparing the feature encoding for nearest neighbor searches.

explain_instances(instances, num_counterfactuals):

Finds and returns the nearest unaffected neighbors for each instance, generating the specified number of counterfactual explanations.

Initializes a new instance of the NearestNeighborMethod class.

explain_instances(instances: DataFrame, num_counterfactuals: int) DataFrame[source]

Generates counterfactual explanations for the provided instances by finding the nearest unaffected neighbors in the training data.

Parameters:

instancespd.DataFrame

DataFrame containing the instances for which counterfactual explanations are needed.

num_counterfactualsint

The number of counterfactuals to generate for each instance.

Returns:

pd.DataFrame

A DataFrame containing the nearest unaffected neighbors (counterfactuals) for each instance.

Notes:

  • If the requested number of counterfactuals exceeds the number of available unaffected instances, a warning is raised, and all unaffected instances are used.

  • Nearest neighbors are determined using a one-hot encoded feature representation.

fit(model, data: DataFrame, outcome_name: str, continuous_features: List[str], feat_to_vary: List[str], random_seed=13)[source]

Fits the NearestNeighborMethod by identifying unaffected instances in the training dataset and preparing feature encodings for counterfactual search.

Parameters:

modelobject

A machine learning model with a predict method that outputs binary predictions (0 or 1).

datapd.DataFrame

A dataset containing the features and outcome variable used for fitting the method.

outcome_namestr

The name of the outcome column in the dataset.

continuous_featuresList[str]

A list of continuous (numerical) feature column names.

feat_to_varyList[str]

A list of features allowed to vary when generating counterfactuals.

random_seedint, optional

Seed for random number generation to ensure reproducibility, by default 13.

class humancompatible.explain.glance.local_cfs.random_sampling.RandomSampling(model, n_most_important, n_categorical_most_frequent, numerical_features, categorical_features, random_state=None)[source]

Bases: LocalCounterfactualMethod

RandomSampling is a local counterfactual method that generates counterfactual instances through random sampling based on the distribution of features in the unaffected training data.

This method identifies the most important features and the most frequent categories within the unaffected training data to generate counterfactuals by sampling from these distributions.

Methods:

__init__(model, n_most_important, n_categorical_most_frequent, numerical_features, categorical_features, random_state=None):

Initializes the RandomSampling instance with the specified parameters.

fit(X, y):

Fits the RandomSampling method to the provided training data by calculating feature importances and identifying unaffected instances.

_sample_instances(n_samples, fixed_feature_values, random_state=None):

Samples instances based on the specified feature distributions, fixing certain feature values while sampling others.

explain(instance, num_counterfactuals, n_samples=1000, random_state=None):

Generates counterfactual explanations for a given instance by sampling and modifying feature values.

explain_instances(instances, num_counterfactuals, n_samples=1000, random_state=None):

Generates counterfactuals for multiple instances by calling the explain method for each instance.

Initializes a new instance of the RandomSampling class.

Parameters:

modelobject

A machine learning model used for predictions and feature importance evaluation.

n_most_importantint

The number of most important features to consider when generating counterfactuals.

n_categorical_most_frequentint

The number of most frequent categories to consider for categorical features.

numerical_featuresList[str]

A list of continuous (numerical) feature names.

categorical_featuresList[str]

A list of categorical feature names.

random_stateint, optional

Seed for random number generation to ensure reproducibility, by default None.

explain(instance, num_counterfactuals, n_samples=1000, random_state=None)[source]

Generates counterfactual explanations for a given instance by sampling and modifying feature values.

Parameters:

instancepd.DataFrame

A single row DataFrame representing the instance for which counterfactuals are generated.

num_counterfactualsint

The number of counterfactuals to generate.

n_samplesint, optional

The number of samples to draw for generating counterfactuals, by default 1000.

random_stateint, optional

Seed for random number generation, by default None.

Returns:

pd.DataFrame

A DataFrame containing the generated counterfactuals for the provided instance.

Raises:

ValueError

If the input instance is not a single-row DataFrame or if its columns do not match the training dataset’s columns.

explain_instances(instances: DataFrame, num_counterfactuals: int, n_samples=1000, random_state=None) DataFrame[source]

Generates counterfactuals for multiple instances by calling the explain method for each instance.

Parameters:

instancespd.DataFrame

DataFrame containing instances for which counterfactual explanations are needed.

num_counterfactualsint

The number of counterfactuals to generate for each instance.

n_samplesint, optional

The number of samples to draw for generating counterfactuals, by default 1000.

random_stateint, optional

Seed for random number generation, by default None.

Returns:

pd.DataFrame

A DataFrame containing the generated counterfactuals for all provided instances.

fit(X: DataFrame, y: Series)[source]

Fits the RandomSampling method to the provided training data by calculating feature importances and identifying unaffected instances.

Parameters:

Xpd.DataFrame

The training dataset containing feature columns.

ypd.Series

The target variable corresponding to the training dataset.

Returns:

selfRandomSampling

Returns the fitted instance of RandomSampling.

Utility Functions

humancompatible.explain.glance.utils.action.actions_mean_pandas(actions: DataFrame, numerical_features: List[str], categorical_features: List[str], categorical_no_action_token: Any) Series[source]

Computes the mean action for numerical features and the most frequent action for categorical features from a given actions DataFrame.

For numerical features, the function calculates the mean of the actions across all instances. For categorical features, it determines the most frequent value in the actions DataFrame, unless all values are equal to the categorical_no_action_token, in which case the token is returned.

Parameters:

actionspd.DataFrame

A DataFrame where each row represents an instance, and each column represents an action for a feature (either numerical or categorical).

numerical_featuresList[str]

List of columns in actions that are numerical features.

categorical_featuresList[str]

List of columns in actions that are categorical features.

categorical_no_action_tokenAny

A token or value that indicates no action is needed for categorical features.

Returns:

pd.Series

A Series where: - For numerical features, the values are the mean of the actions for each numerical column. - For categorical features, the values are the most frequent action in each categorical column, or the categorical_no_action_token if no action was needed.

humancompatible.explain.glance.utils.action.apply_action_numpy(X: ndarray[Any, dtype[number]], action: ndarray[Any, dtype[number]], numerical_columns: List[int], categorical_columns: List[int], categorical_no_action_token: number) ndarray[Any, dtype[number]][source]

Apply action to all rows of X. For numerical columns, add the respective component from action. For categorical columns, set the component of all rows to the value of action, unless it is equal to the categorical_no_action_token, in which case do nothing for this column.

Note: input array should have a numeric dtype. Thus, categorical columns should be encoded by numbers (e.g. Ordinal Encoding).

Parameters:
  • X (npt.NDArray[np.number]) – matrix of observations

  • action (npt.NDArray[np.number]) – for each column / feature, the action to be applied

  • numerical_columns (List[int]) – numerical column indices

  • categorical_columns (List[int]) – categorical column indices

  • categorical_no_action_token (np.number) – special value signifying no-action (i.e. equivalent to 0 for numerical columns)

Returns:

new observations resulting from the action application.

Return type:

npt.NDArray[np.number]

humancompatible.explain.glance.utils.action.apply_action_pandas(X: DataFrame, action: Series, numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: Any, numerical_no_action_token: Any | None = None) DataFrame[source]

Apply action to all rows of X. For numerical columns, add the respective component from action. For categorical columns, set the component of all rows to the value of action, unless it is equal to the categorical_no_action_token, in which case do nothing for this column.

Parameters:
  • X (pd.DataFrame) – matrix of observations

  • action (pd.Series) – for each column / feature, the action to be applied

  • numerical_columns (List[str]) – numerical column names

  • categorical_columns (List[str]) – categorical column names

  • categorical_no_action_token (Any) – special value signifying no-action (i.e. equivalent to 0 for numerical columns)

Returns:

new observations resulting from the action application.

Return type:

pd.DataFrame

humancompatible.explain.glance.utils.action.apply_actions_pandas_rows(X: DataFrame, actions: DataFrame, numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: object) DataFrame[source]

Applies a set of actions to transform the original dataset X based on the actions specified in the actions DataFrame.

For numerical columns, the function adds the values from the actions DataFrame to the corresponding columns in X. For categorical columns, if the action for a column is not equal to the categorical_no_action_token, the value from the actions DataFrame is used to update X. Otherwise, the original value from X is retained.

Parameters:

Xpd.DataFrame

The original dataset, where each row represents an instance, and each column is a feature.

actionspd.DataFrame

A DataFrame of the same shape as X, containing the actions to apply to each feature. - For numerical columns: contains the values to add to the corresponding features in X. - For categorical columns: contains either the new value to apply or the categorical_no_action_token.

numerical_columnsList[str]

List of columns in X and actions that are numerical.

categorical_columnsList[str]

List of columns in X and actions that are categorical.

categorical_no_action_tokenobject

A token or value indicating that no action should be taken for a categorical feature.

Returns:

pd.DataFrame

A DataFrame of the same shape as X where the actions have been applied: - For numerical columns: each value is updated by adding the corresponding action from actions. - For categorical columns: updated values from actions are used where applicable; otherwise, the original values from X are retained.

humancompatible.explain.glance.utils.action.extract_actions_pandas(X: DataFrame, cfs: DataFrame, categorical_features: List[str], numerical_features: List[str], categorical_no_action_token: Any)[source]

Extracts the actions needed to convert the original dataset X into the counterfactual dataset cfs.

For categorical features, the function identifies changes between X and cfs. If no change is observed in a categorical feature, a specified categorical_no_action_token is used to denote that no action is needed. For numerical features, the function computes the difference between the counterfactual and the original values.

Parameters:

Xpd.DataFrame

The original dataset, where each row represents an instance, and each column is a feature.

cfspd.DataFrame

The counterfactual dataset, which has the same structure as X. It represents the desired state after some action is applied.

categorical_featuresList[str]

List of columns in X and cfs that are categorical.

numerical_featuresList[str]

List of columns in X and cfs that are numerical.

categorical_no_action_tokenAny

A token or value to insert into categorical features where no change is needed (i.e., the feature value in X is the same as in cfs).

Returns:

pd.DataFrame

A DataFrame of the same shape as X and cfs where each value indicates the action required to transform X into cfs: - For categorical features: the value in cfs if it differs from X, otherwise categorical_no_action_token. - For numerical features: the difference between cfs and X.

humancompatible.explain.glance.utils.centroid.centroid_numpy(X: ndarray[Any, dtype[number]], numerical_columns: List[int], categorical_columns: List[int]) ndarray[Any, dtype[number]][source]

Calculates the centroid of the rows of a 2d numy array. Specifically, for the numerical_columns columns, the centroid has value the mean of all rows, while for the categorical_columns columns, the centroid has value the mode of all rows.

Parameters:
  • X (npt.NDArray[np.number]) – matrix of observations

  • numerical_columns (List[int]) – numerical column indices

  • categorical_columns (List[int]) – categorical column indices

Returns:

2d numpy array whose single row is the centroid

Return type:

npt.NDArray[np.number]

humancompatible.explain.glance.utils.centroid.centroid_pandas(X: DataFrame, numerical_columns: List[str], categorical_columns: List[str]) DataFrame[source]

Calculates the centroid of the rows of a pandas DataFrame. Specifically, for the numerical_columns columns, the centroid has value the mean of all rows, while for the categorical_columns columns, the centroid has value the mode of all rows.

Parameters:
  • X (pd.DataFrame) – matrix of observations

  • numerical_columns (List[str]) – numerical column names

  • categorical_columns (List[str]) – categorical column names

Returns:

DataFrame whose single row is the centroid

Return type:

pd.DataFrame