humancompatible.glance.iterative_merges package
Submodules
humancompatible.explain.glance.iterative_merges.iterative_merges module
- class humancompatible.explain.glance.iterative_merges.iterative_merges.C_GLANCE(model: Any, initial_clusters: int = 100, final_clusters: int = 10, num_local_counterfactuals: int = 5, heuristic_weights: Tuple[float, float] = (0.5, 0.5), alternative_merges: bool = True, random_seed: int = 13, verbose=True)[source]
Bases:
GlobalCounterfactualMethodA class for generating global counterfactual explanations using an iterative merging approach.
It allows the user to control the number of clusters and the methods used for clustering and generating counterfactuals.
Attributes:
- modelAny
The predictive model used for generating counterfactuals.
- initial_clustersint
The initial number of clusters to form.
- final_clustersint
The target number of clusters after merging.
- num_local_counterfactualsint
The number of local counterfactuals to generate for each cluster.
- heuristic_weightsTuple[float, float]
Weights used in the heuristic for merging clusters.
- alternative_mergesbool
If True, allows alternative merging strategies.
- random_seedint
Seed for random number generation.
- verbosebool
If True, enables verbose output during processing.
- final_clusteringOptional[Dict[int, pd.DataFrame]]
The final clustering of instances after merging.
- cluster_resultsOptional[Dict[int, Dict[str, Any]]]
Results of the clustering including effectiveness and cost metrics.
Methods:
- _set_features_names(X, numerical_names, categorical_names):
Sets the feature names for numerical and categorical features.
- fit(X, y, train_dataset, feat_to_vary, numeric_features_names, categorical_features_names,
clustering_method, cf_generator, cluster_action_choice_algo, …) Fits the clustering and counterfactual generation model to the provided dataset.
- explain_group(instances):
Explains the group of instances by generating counterfactuals based on clustering.
- global_actions():
Retrieves the global actions derived from the clustered results.
Initializes the IterativeMerges instance.
Parameters:
- modelAny
The predictive model used for generating counterfactuals.
- initial_clustersint, optional
The initial number of clusters to form. Default is 100.
- final_clustersint, optional
The target number of clusters after merging. Default is 10.
- num_local_counterfactualsint, optional
The number of local counterfactuals to generate for each cluster. Default is 5.
- heuristic_weightsTuple[float, float], optional
Weights used in the heuristic for merging clusters. Default is (0.5, 0.5).
- alternative_mergesbool, optional
If True, allows alternative merging strategies. Default is True.
- random_seedint, optional
Seed for random number generation. Default is 13.
- verbosebool, optional
If True, enables verbose output during processing. Default is True.
- explain_group(instances: DataFrame) Tuple[int, float][source]
Explains the group of instances by generating counterfactuals based on clustering.
Parameters:
- instancespd.DataFrame
The group of instances to explain.
Returns:
- Tuple[int, float]
A tuple containing the total effectiveness and total cost of the generated counterfactuals.
- fit(X: DataFrame, y: Series, train_dataset: DataFrame, feat_to_vary: List[str] | str | None = 'all', numeric_features_names: List[str] | None = None, categorical_features_names: List[str] | None = None, clustering_method: ClusteringMethod | Literal['KMeans'] = 'KMeans', cf_generator: LocalCounterfactualMethod | Literal['Dice', 'NearestNeighbors', 'RandomSampling'] = 'Dice', cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost'] = 'max-eff', nns__n_scalars: int | None = None, rs__n_most_important: int | None = None, rs__n_categorical_most_frequent: int | None = None, lowcost__action_threshold: int | None = None, lowcost__num_low_cost: int | None = None, min_cost_eff_thres__effectiveness_threshold: float | None = None, min_cost_eff_thres_combinations__num_min_cost: int | None = None, eff_thres_hybrid__max_n_actions_full_combinations: int | None = None) C_GLANCE[source]
Fits the clustering and counterfactual generation model to the provided dataset.
Parameters:
- Xpd.DataFrame
Features of the dataset.
- ypd.Series
Target variable.
- train_datasetpd.DataFrame
The training dataset used for local counterfactual generation methods.
- feat_to_varyOptional[Union[List[str], str]], optional
Features to vary in counterfactual generation. Default is “all”.
- numeric_features_namesOptional[List[str]], optional
List of numeric feature names. If None, they will be inferred from X.
- categorical_features_namesOptional[List[str]], optional
List of categorical feature names. If None, they will be inferred from X.
- clustering_methodUnion[ClusteringMethod, Literal[“KMeans”]], optional
The clustering method to use. Default is “KMeans”.
- cf_generatorUnion[LocalCounterfactualMethod, Literal[“Dice”, “NearestNeighbors”, “RandomSampling”]], optional
The local counterfactual generation method to use. Default is “Dice”.
- cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost””], optional
The algorithm for selecting actions from clusters. Default is “max-eff”.
- nns__n_scalarsOptional[int], optional
Number of scalar features to use for nearest neighbors. Default is None.
- rs__n_most_importantOptional[int], optional
Number of most important features for random sampling. Default is None.
- rs__n_categorical_most_frequentOptional[int], optional
Number of most frequent categorical features for random sampling. Default is None.
- lowcost__action_thresholdOptional[int], optional
Action threshold for low-cost methods. Default is None.
- lowcost__num_low_costOptional[int], optional
Number of low-cost actions to consider. Default is None.
- min_cost_eff_thres__effectiveness_thresholdOptional[float], optional
Effectiveness threshold for minimum cost methods. Default is None.
- min_cost_eff_thres_combinations__num_min_costOptional[int], optional
Number of minimum cost combinations to evaluate. Default is None.
- eff_thres_hybrid__max_n_actions_full_combinationsOptional[int], optional
Maximum number of actions for full combinations in hybrid thresholding. Default is None.
Returns:
- IterativeMerges
Returns the fitted instance of IterativeMerges.
- humancompatible.explain.glance.iterative_merges.iterative_merges.action_fake_cost(action: Series, numerical_features_names: List[str], categorical_features_names: List[str])[source]
- humancompatible.explain.glance.iterative_merges.iterative_merges.actions_cumulative_eff_cost(model: Any, X: DataFrame, actions_with_costs: List[Tuple[Series, float]], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_columns: List[str], categorical_columns: List[str], categorical_no_action_token: Any) Tuple[float, float][source]
Evaluates the cumulative effectiveness and cost of applying a sequence of actions to a dataset using a predictive model.
This function applies each action from the sorted list of actions with their costs, predicts the outcomes, and calculates the total number of predictions that were flipped as well as the total recourse cost incurred from the actions.
Parameters:
- modelAny
A machine learning model used for making predictions on the modified instances.
- Xpd.DataFrame
The original DataFrame of instances to which actions will be applied.
- actions_with_costsList[Tuple[pd.Series, float]]
A list of tuples where each tuple contains: - A pandas Series representing the action to apply. - A float representing the cost associated with the action.
- dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]
A function that computes the distance or cost between two DataFrames.
- numerical_columnsList[str]
A list of names for the numerical columns in the DataFrame.
- categorical_columnsList[str]
A list of names for the categorical columns in the DataFrame.
- categorical_no_action_tokenAny
A token used to represent the absence of an action for categorical features.
Returns:
- Tuple[float, float]
A tuple containing: - The total number of predictions flipped across all actions applied. - The total recourse cost incurred from applying the actions.
- humancompatible.explain.glance.iterative_merges.iterative_merges.cluster_results(model: Any, instances: DataFrame, clusters: Dict[int, DataFrame], cluster_expl_actions: Dict[int, DataFrame], dist_func_dataframe: Callable[[DataFrame, DataFrame], Series], numerical_features_names: List[str], categorical_features_names: List[str], cluster_action_choice_algo: Literal['max-eff', 'mean-act', 'low-cost', 'min-cost-eff-thres', 'eff-thres-hybrid'] = 'max-eff', action_threshold: int = 2, num_low_cost: int = 20, effectiveness_threshold: float = 0.1, num_min_cost: int | None = None, max_n_actions_full_combinations: int = 50) Tuple[Dict[int, Dict[str, Any]], float, float][source]
Evaluates and selects actions for each cluster based on a specified action choice algorithm.
This function iterates through each cluster of instances, applying the specified algorithm to select the best action for achieving recourse while minimizing costs. It calculates the total effectiveness and mean recourse costs across all clusters.
Parameters:
- modelAny
A machine learning model used for making predictions on modified instances.
- instancespd.DataFrame
The DataFrame of original instances to which actions will be applied.
- clustersDict[int, pd.DataFrame]
A dictionary mapping cluster IDs to DataFrames of instances belonging to each cluster.
- cluster_expl_actionsDict[int, pd.DataFrame]
A dictionary mapping cluster IDs to DataFrames of candidate actions for each cluster.
- dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]
A function that computes the distance or cost between two DataFrames.
- numerical_features_namesList[str]
A list of names for the numerical columns in the DataFrames.
- categorical_features_namesList[str]
A list of names for the categorical columns in the DataFrames.
- cluster_action_choice_algoLiteral[“max-eff”, “mean-act”, “low-cost”, “min-cost-eff-thres”, “eff-thres-hybrid”]
The algorithm to use for selecting actions from candidate actions. Options include: - “max-eff”: Select the action with maximum effectiveness. - “mean-act”: Select the mean action from candidate actions. - “low-cost”: Select actions based on low cost.
- action_thresholdint
Minimum threshold for the number of flipped predictions required to consider an action effective.
- num_low_costint
The number of low-cost actions to consider (used when the low-cost algorithm is selected).
- effectiveness_thresholdfloat
Minimum effectiveness required for actions (used when the min-cost-eff-thres algorithm is selected).
- num_min_costOptional[int]
Number of minimum cost actions to consider (used when the min-cost-eff-thres algorithm is selected).
- max_n_actions_full_combinationsint
Maximum number of actions to evaluate in full combinations (not currently used in the function).
Returns:
- Tuple[Dict[int, Dict[str, Any]], float, float]
A tuple containing: - A dictionary where each key is a cluster ID and each value is another dictionary with the selected action, its effectiveness, and cost. - Total effectiveness percentage across all clusters. - Total mean recourse cost across all clusters.
- humancompatible.explain.glance.iterative_merges.iterative_merges.cumulative(model, instances, actions, dist_func_dataframe, numeric_features_names, categorical_features_names, categorical_no_action_token)[source]
Computes the cumulative effectiveness and cost of applying a set of actions to a given set of instances using a predictive model.
Parameters:
- modelAny
A predictive model with a predict method. This model will be used to predict outcomes after applying actions to the input instances.
- instancespd.DataFrame
A DataFrame containing the instances for which actions are to be applied.
- actionsList[dict]
A list of actions, where each action is represented as a dictionary that specifies how to modify the instances.
- dist_func_dataframeCallable[[pd.DataFrame, pd.DataFrame], pd.Series]
A distance function that takes two DataFrames and returns a Series of distances between corresponding rows.
- numeric_features_namesList[str]
A list of names for the numeric features in the instances DataFrame.
- categorical_features_namesList[str]
A list of names for the categorical features in the instances DataFrame.
- categorical_no_action_tokenAny
A token used to represent a no-action state for categorical features.
Returns:
- Tuple[int, float]
A tuple containing: - effectiveness: An integer count of how many actions were effective (i.e.,
resulted in a finite cost).
cost: A float representing the total cost incurred by the effective actions.
- humancompatible.explain.glance.iterative_merges.iterative_merges.format_glance_output(cluster_stats: Dict[int, Dict[str, Number]], categorical_columns: List[str])[source]
- humancompatible.explain.glance.iterative_merges.iterative_merges.print_results(clusters_stats: Dict[int, Dict[str, Number]], total_effectiveness: float, total_cost: float)[source]
Prints the statistics for each cluster, including effectiveness and cost.
This function takes the results of cluster analysis and formats them for easy viewing. It displays the size of each cluster, the actions taken, and the effectiveness and cost of those actions.
Parameters:
- clusters_statsDict[int, Dict[str, numbers.Number]]
A dictionary where keys are cluster IDs (integers) and values are dictionaries containing statistics for each cluster. Each value dictionary must contain the following keys:
“size”: The size of the cluster.
“action”: The actions taken for the cluster.
“effectiveness”: The effectiveness of the actions in the cluster.
“cost”: The cost associated with the actions.
- total_effectivenessfloat
The total effectiveness percentage across all clusters, represented as a decimal (e.g., 0.75 for 75%).
- total_costfloat
The total cost associated with the actions taken across all clusters.
humancompatible.explain.glance.iterative_merges.phase2 module
- humancompatible.explain.glance.iterative_merges.phase2.generate_cluster_centroid_explanations(cluster_centroids: Dict[int, DataFrame], cf_generator: LocalCounterfactualMethod, num_local_counterfactuals: int, numerical_features_names: List[str], categorical_features_names: List[str]) Tuple[Dict[int, DataFrame], Dict[int, DataFrame], Dict[int, DataFrame]][source]
Generates explanations for cluster centroids by creating counterfactual instances for each centroid and extracting corresponding actions and explanations.
Parameters:
- cluster_centroidsDict[int, pd.DataFrame]
A dictionary where keys are cluster identifiers and values are DataFrames representing the centroids of each cluster.
- cf_generatorLocalCounterfactualMethod
An instance of a LocalCounterfactualMethod used to generate counterfactuals.
- num_local_counterfactualsint
The number of counterfactuals to generate for each cluster centroid.
- numerical_features_namesList[str]
A list of names for numerical features in the dataset.
- categorical_features_namesList[str]
A list of names for categorical features in the dataset.
Returns:
- Tuple[Dict[int, pd.DataFrame], Dict[int, pd.DataFrame], Dict[int, pd.DataFrame]]
A tuple containing three dictionaries: - cluster_explanations: A dictionary of counterfactuals for each cluster centroid. - cluster_expl_actions: A dictionary of extracted actions for the generated counterfactuals. - explanations_centroid: A dictionary of centroid explanations based on the generated counterfactuals.
Raises:
- ValueError
If no counterfactuals are found for any of the centroids.