humancompatible.glance.iterative_merges package

Submodules

humancompatible.explain.glance.iterative_merges.iterative_merges module

class humancompatible.explain.glance.iterative_merges.iterative_merges.C_GLANCE(model: Any, initial_clusters: int = 100, final_clusters: int = 10, num_local_counterfactuals: int = 5, heuristic_weights: Tuple[float, float] = (0.5, 0.5), alternative_merges: bool = True, random_seed: int = 13, verbose=True)[source]

Bases: GlobalCounterfactualMethod

A class for generating global counterfactual explanations using an iterative merging approach.

It allows the user to control the number of clusters and the methods used for clustering and generating counterfactuals.

Attributes:

modelAny: The predictive model used for generating counterfactuals.
initial_clustersint: The initial number of clusters to form.
final_clustersint: The target number of clusters after merging.
num_local_counterfactualsint: The number of local counterfactuals to generate for each cluster.
heuristic_weightsTuple[float, float]: Weights used in the heuristic for merging clusters.
alternative_mergesbool: If True, allows alternative merging strategies.
random_seedint: Seed for random number generation.
verbosebool: If True, enables verbose output during processing.
final_clusteringOptional[Dict[int, pd.DataFrame]]: The final clustering of instances after merging.
cluster_resultsOptional[Dict[int, Dict[str, Any]]]: Results of the clustering including effectiveness and cost metrics.

Methods:

_set_features_names(X, numerical_names, categorical_names):: Sets the feature names for numerical and categorical features.
fit(X, y, train_dataset, feat_to_vary, numeric_features_names, categorical_features_names,: clustering_method, cf_generator, cluster_action_choice_algo, …) Fits the clustering and counterfactual generation model to the provided dataset.
explain_group(instances):: Explains the group of instances by generating counterfactuals based on clustering.
global_actions():: Retrieves the global actions derived from the clustered results.

Initializes the IterativeMerges instance.

Parameters:

modelAny: The predictive model used for generating counterfactuals.
initial_clustersint, optional: The initial number of clusters to form. Default is 100.
final_clustersint, optional: The target number of clusters after merging. Default is 10.
num_local_counterfactualsint, optional: The number of local counterfactuals to generate for each cluster. Default is 5.
heuristic_weightsTuple[float, float], optional: Weights used in the heuristic for merging clusters. Default is (0.5, 0.5).
alternative_mergesbool, optional: If True, allows alternative merging strategies. Default is True.
random_seedint, optional: Seed for random number generation. Default is 13.
verbosebool, optional: If True, enables verbose output during processing. Default is True.

explain_group(instances: DataFrame) → Tuple[int, float][source]

Explains the group of instances by generating counterfactuals based on clustering.

Parameters:

instancespd.DataFrame: The group of instances to explain.

Returns:

Tuple[int, float]: A tuple containing the total effectiveness and total cost of the generated counterfactuals.

fit(X: DataFrame, y: Series, train_dataset: DataFrame, feat_to_vary: List[str] | str | None = 'all', numeric_features_names: List[str] | None = None, categorical_features_names: List[str] | None = None, clustering_method: ClusteringMethod | Literal['KMeans'] = 'KMeans', cf_generator: LocalCounterfactualMethod | Literal['Dice', 'NearestNeighbors', 'RandomSampling'] = 'Dice', cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost'] = 'max-eff', nns__n_scalars: int | None = None, rs__n_most_important: int | None = None, rs__n_categorical_most_frequent: int | None = None, lowcost__action_threshold: int | None = None, lowcost__num_low_cost: int | None = None, min_cost_eff_thres__effectiveness_threshold: float | None = None, min_cost_eff_thres_combinations__num_min_cost: int | None = None, eff_thres_hybrid__max_n_actions_full_combinations: int | None = None) → C_GLANCE[source]

Fits the clustering and counterfactual generation model to the provided dataset.

Parameters:

Xpd.DataFrame: Features of the dataset.
ypd.Series: Target variable.
train_datasetpd.DataFrame: The training dataset used for local counterfactual generation methods.
feat_to_varyOptional[Union[List[str], str]], optional: Features to vary in counterfactual generation. Default is “all”.
numeric_features_namesOptional[List[str]], optional: List of numeric feature names. If None, they will be inferred from X.
categorical_features_namesOptional[List[str]], optional: List of categorical feature names. If None, they will be inferred from X.
clustering_methodUnion[ClusteringMethod, Literal[“KMeans”]], optional: The clustering method to use. Default is “KMeans”.
cf_generatorUnion[LocalCounterfactualMethod, Literal[“Dice”, “NearestNeighbors”, “RandomSampling”]], optional: The local counterfactual generation method to use. Default is “Dice”.
cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost””], optional: The algorithm for selecting actions from clusters. Default is “max-eff”.
nns__n_scalarsOptional[int], optional: Number of scalar features to use for nearest neighbors. Default is None.
rs__n_most_importantOptional[int], optional: Number of most important features for random sampling. Default is None.
rs__n_categorical_most_frequentOptional[int], optional: Number of most frequent categorical features for random sampling. Default is None.
lowcost__action_thresholdOptional[int], optional: Action threshold for low-cost methods. Default is None.
lowcost__num_low_costOptional[int], optional: Number of low-cost actions to consider. Default is None.
min_cost_eff_thres__effectiveness_thresholdOptional[float], optional: Effectiveness threshold for minimum cost methods. Default is None.
min_cost_eff_thres_combinations__num_min_costOptional[int], optional: Number of minimum cost combinations to evaluate. Default is None.
eff_thres_hybrid__max_n_actions_full_combinationsOptional[int], optional: Maximum number of actions for full combinations in hybrid thresholding. Default is None.

Returns:

IterativeMerges: Returns the fitted instance of IterativeMerges.

global_actions()[source]

humancompatible.explain.glance.iterative_merges.iterative_merges.action_fake_cost(action: Series, numerical_features_names: List[str], categorical_features_names: List[str])[source]

humancompatible.explain.glance.iterative_merges.iterative_merges.actions_cumulative_eff_cost(model: Any, X: DataFrame, actions_with_costs: List[Tuple[Series, float]], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: Any) → Tuple[float, float][source]

Evaluates the cumulative effectiveness and cost of applying a sequence of actions to a dataset using a predictive model.

This function applies each action from the sorted list of actions with their costs, predicts the outcomes, and calculates the total number of predictions that were flipped as well as the total recourse cost incurred from the actions.

Parameters:

modelAny: A machine learning model used for making predictions on the modified instances.
Xpd.DataFrame: The original DataFrame of instances to which actions will be applied.
actions_with_costsList[Tuple[pd.Series, float]]: A list of tuples where each tuple contains: - A pandas Series representing the action to apply. - A float representing the cost associated with the action.
dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]: A function that computes the distance or cost between two DataFrames.
numerical_columnsList[str]: A list of names for the numerical columns in the DataFrame.
categorical_columnsList[str]: A list of names for the categorical columns in the DataFrame.
categorical_no_action_tokenAny: A token used to represent the absence of an action for categorical features.

Returns:

Tuple[float, float]: A tuple containing: - The total number of predictions flipped across all actions applied. - The total recourse cost incurred from applying the actions.

humancompatible.explain.glance.iterative_merges.iterative_merges.cluster_results(model: Any, instances: DataFrame, clusters: Dict[int, DataFrame], cluster_expl_actions: Dict[int, DataFrame], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_features_names: List[str], categorical_features_names: List[str], cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost', 'min-cost-eff-thres', 'eff-thres-hybrid'] = 'max-eff', action_threshold: int = 2, num_low_cost: int = 20, effectiveness_threshold: float = 0.1, num_min_cost: int | None = None, max_n_actions_full_combinations: int = 50) → Tuple[Dict[int, Dict[str, Any]], float, float][source]

Evaluates and selects actions for each cluster based on a specified action choice algorithm.

This function iterates through each cluster of instances, applying the specified algorithm to select the best action for achieving recourse while minimizing costs. It calculates the total effectiveness and mean recourse costs across all clusters.

Parameters:

modelAny: A machine learning model used for making predictions on modified instances.
instancespd.DataFrame: The DataFrame of original instances to which actions will be applied.
clustersDict[int, pd.DataFrame]: A dictionary mapping cluster IDs to DataFrames of instances belonging to each cluster.
cluster_expl_actionsDict[int, pd.DataFrame]: A dictionary mapping cluster IDs to DataFrames of candidate actions for each cluster.
dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]: A function that computes the distance or cost between two DataFrames.
numerical_features_namesList[str]: A list of names for the numerical columns in the DataFrames.
categorical_features_namesList[str]: A list of names for the categorical columns in the DataFrames.
cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost”, “min-cost-eff-thres”, “eff-thres-hybrid”]: The algorithm to use for selecting actions from candidate actions. Options include: - “max-eff”: Select the action with maximum effectiveness. - “mean-act”: Select the mean action from candidate actions. - “low-cost”: Select actions based on low cost.
action_thresholdint: Minimum threshold for the number of flipped predictions required to consider an action effective.
num_low_costint: The number of low-cost actions to consider (used when the low-cost algorithm is selected).
effectiveness_thresholdfloat: Minimum effectiveness required for actions (used when the min-cost-eff-thres algorithm is selected).
num_min_costOptional[int]: Number of minimum cost actions to consider (used when the min-cost-eff-thres algorithm is selected).
max_n_actions_full_combinationsint: Maximum number of actions to evaluate in full combinations (not currently used in the function).

Returns:

Tuple[Dict[int, Dict[str, Any]], float, float]: A tuple containing: - A dictionary where each key is a cluster ID and each value is another dictionary with the selected action, its effectiveness, and cost. - Total effectiveness percentage across all clusters. - Total mean recourse cost across all clusters.

humancompatible.explain.glance.iterative_merges.iterative_merges.cumulative(model, instances, actions, dist_func_dataframe, numeric_features_names, categorical_features_names, categorical_no_action_token)[source]

Computes the cumulative effectiveness and cost of applying a set of actions to a given set of instances using a predictive model.

Parameters:

modelAny: A predictive model with a predict method. This model will be used to predict outcomes after applying actions to the input instances.
instancespd.DataFrame: A DataFrame containing the instances for which actions are to be applied.
actionsList[dict]: A list of actions, where each action is represented as a dictionary that specifies how to modify the instances.
dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]: A distance function that takes two DataFrames and returns a Series of distances between corresponding rows.
numeric_features_namesList[str]: A list of names for the numeric features in the instances DataFrame.
categorical_features_namesList[str]: A list of names for the categorical features in the instances DataFrame.
categorical_no_action_tokenAny: A token used to represent a no-action state for categorical features.

Returns:

Tuple[int, float]

A tuple containing: - effectiveness: An integer count of how many actions were effective (i.e.,

resulted in a finite cost).

cost: A float representing the total cost incurred by the effective actions.

humancompatible.explain.glance.iterative_merges.iterative_merges.format_glance_output(cluster_stats: Dict[int, Dict[str, Number]], categorical_columns: List[str])[source]

humancompatible.explain.glance.iterative_merges.iterative_merges.print_results(clusters_stats: Dict[int, Dict[str, Number]], total_effectiveness: float, total_cost: float)[source]

Prints the statistics for each cluster, including effectiveness and cost.

This function takes the results of cluster analysis and formats them for easy viewing. It displays the size of each cluster, the actions taken, and the effectiveness and cost of those actions.

Parameters:

clusters_statsDict[int, Dict[str, numbers.Number]]

A dictionary where keys are cluster IDs (integers) and values are dictionaries containing statistics for each cluster. Each value dictionary must contain the following keys:

“size”: The size of the cluster.

“action”: The actions taken for the cluster.

“effectiveness”: The effectiveness of the actions in the cluster.

“cost”: The cost associated with the actions.

total_effectivenessfloat

The total effectiveness percentage across all clusters, represented as a decimal (e.g., 0.75 for 75%).

total_costfloat

The total cost associated with the actions taken across all clusters.

humancompatible.explain.glance.iterative_merges.phase2 module

humancompatible.explain.glance.iterative_merges.phase2.generate_cluster_centroid_explanations(cluster_centroids: Dict[int, DataFrame], cf_generator: LocalCounterfactualMethod, num_local_counterfactuals: int, numerical_features_names: List[str], categorical_features_names: List[str]) → Tuple[Dict[int, DataFrame], Dict[int, DataFrame], Dict[int, DataFrame]][source]

Generates explanations for cluster centroids by creating counterfactual instances for each centroid and extracting corresponding actions and explanations.

Parameters:

cluster_centroidsDict[int, pd.DataFrame]: A dictionary where keys are cluster identifiers and values are DataFrames representing the centroids of each cluster.
cf_generatorLocalCounterfactualMethod: An instance of a LocalCounterfactualMethod used to generate counterfactuals.
num_local_counterfactualsint: The number of counterfactuals to generate for each cluster centroid.
numerical_features_namesList[str]: A list of names for numerical features in the dataset.
categorical_features_namesList[str]: A list of names for categorical features in the dataset.

Returns:

Tuple[Dict[int, pd.DataFrame], Dict[int, pd.DataFrame], Dict[int, pd.DataFrame]]: A tuple containing three dictionaries: - cluster_explanations: A dictionary of counterfactuals for each cluster centroid. - cluster_expl_actions: A dictionary of extracted actions for the generated counterfactuals. - explanations_centroid: A dictionary of centroid explanations based on the generated counterfactuals.

Raises:

ValueError: If no counterfactuals are found for any of the centroids.

humancompatible.glance.iterative_merges package

Submodules

humancompatible.explain.glance.iterative_merges.iterative_merges module

Attributes:

Methods:

Parameters:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

humancompatible.explain.glance.iterative_merges.phase2 module

Parameters:

Returns:

Raises:

Module contents