A Theory of Explanations

A Theory of Explainability

According to philosophy, social science, and psychology theories, a common definition of explainability or interpretability is the degree to which a human can understand the reasons behind a decision or an action [Mil19]. The explainability of AI/ML algorithms can be achieved by (1) making the entire decision-making process transparent and comprehensible and (2) explicitly providing an explanation for each decision [Lip18] (since an explanation is not likely applicable to all decisions [Lea14]. Hence, research has emerged to explore how to explain decisions made by complex, black-box models and how explanations are presented in a form that would be easily understood (and hence, accepted) by humans.

Explainability Goals

Explanations of AI/ML systems can be presented in various forms to serve various goals of different stakeholders. For example, the goal of software managers is to understand the general characteristics of software projects that are associated with defect-proneness to chart appropriate qualitity improvement plans. Thus, explanations needed by such software managers are generic explanations that describe the whole reasons and logic behind decisions of AI/ML systems. On the other hand, the goal of software developers is to understand why a particular file is predicted as defective by AI/ML systems to mitigate the risk of such file being defective. In this case, explanations needed by software developers are explanations that are specific to such file and describe the reasons and logic behind defective predictions made by AI/ML systems.

What are Explanations?

According to a philosophical and psychological theory of explanations, Salmon  [Sal84] argue that explanations can be presented as a causal chain of causes that lead to the decision. Causal chains can be classified into five categories [HMS05]: temporal, coincidental, unfolding, opportunity chains and pre-emptive. Each type of causal chain is thus associated with an explanation type. However, identifying the complete causal chain of causes is challenging, since most AI/ML techniques produce only correlations instead of causations. In contrast, Miller [Mil19] argue that explanations can be presented as answers to why-questions. Similarly, Lipton [Lip90] also share a similar view of explanations as being contrastive.

Types of Intelligibility Questions

Lim et al. [LYAW19] categorized questions towards the inference mechanism of AI/ML systems into: What, Why, Why Not, What If, and How To. Below, we provide an example of questions for each type of the intelligibility questions.

  • What: What is the logic behind the AI/ML models?

  • Why: Why is an instance predicted as TRUE?

  • Why Not: Why not is an instance predicted as FALSE?

  • How To: How can we reverse the prediction of an instance (e.g., from TRUE to FALSE) generated by the system?

  • What If: What would the system predict if the values of an instance are changed?

Scopes of Explanations

Explainability can be achieved at two levels:

The explainability of software analytics can be achieved by:

Global Explanability: Using interpretable machine learning techniques (e.g., decision tree, decision rules or logistic regression techniques) or intrinsic model-specific techniques (e.g., ANOVA, variable importance) so the entire predictions and recommendations process are transparent and comprehensible. Such intrinsic model-specific techniques aim to provide the global explainability. Thus, users can only understand how the model works globally (e.g., by inspecting a branch of decision trees). However, users often do not understand why the model makes that prediction.

Local Explanability: Using model-agnostic techniques (e.g., LIME [RSG16b]) to explain the predictions of the software analytics models (e.g., neural network, random forest). Such post-hoc model-agnostic techniques can provide an explanation for each prediction (i.e., an instance to be explained). Users can then understand why the prediction is made by the software analytics models.

Types of Explanations

To better explain types of explanations, we first provide definitions of Fact and Foil with examples in the context of defect prediction.

Fact refers to a cause that is actually happened (what happened).
Example: A.java is predicted as defective.

Foil refers to an expected or plausible-to-happen cause that is an alternative to Fact.
Examples: (1) A.java is predicted as clean. and (2) B.java is predicted as clean.

With respect to the fact and foil, Lim et al. [LYAW19] discussed 4 types of explanations as follows:

  • Causal Explanation refers to an explanation that is focused on selected causes relevant to interpreting the observation with respect to existing knowledge. An example of causal explanation presented as an answer to a question with respect to a fact cause example is provided below.
    Example: [Why] Why A.java is predicted as defective?

  • Contrastive Explanation refers to an explanation that is focused on the contrastive between a fact and a foil. Recently, Van Bouwel and Weber [VBW02] futher distinguised contrastive explanations into 3 types as follows:

    • Property-contrast (P-contrast) is the differences in the Properties within an object.
      Example: [Why not] Why A.java is predicted as defective, rather than clean?

    • Object-contrast (O-contrast) is the differences between two Objects.
      Example: [Why not] Why A.java is predicted as defective, while B.java is predicted as clean?

    • Time-contrast is the differences within an object over Time.
      Example: [Why not] Why A.java is predicted as defective in the current release, but was predicted as clean in the previous release?

  • Counterfactual Explanation refers to an explanation of what needs to change for an alternative outcome to happen (foil).
    Example: [How to] How can we reverse the prediction of A.java generated by the model (from defective to clean)?

  • Transfactual Explanation refers to an explanation that explains a hyperthetical scenario of what if the factors were different, then what the effect would be. Example: [What if] What would be the prediction of A.java if its size and complexity were changed?

Finally, we present a summary table of the scope of explanations, types of explanation, definition, examples, and techniques for generating answers towards such questions with respect to the five types of questions.

Questions

Scope of Explanations

Types of Explanation

Description

Examples

Techniques

What

Global

Plain-Fact

Describe the weights of features used by the model

What is the logic behind the AI/ML models?

ANOVA, VarImp, Decision Tree, and Decision Rules

Why

Local

Causal

Show how features of the instance contribute to the model’s prediction

Why is an instance predicted as TRUE?

Supporting Scores of LIME and SHAP

Why not

Local

Contrastive

Contradicting scores?? (not sure how to explain)

Why is an instance not predicted as FALSE?

Contradicting Scores of LIME and SHAP

How To

Local

Counterfactual

Describe what are the features that can reverse the prediction.

How can we reverse the prediction of an instance generated by the system?

LORE

What If

Local

Transfactual

Show how the prediction changes corresponding to changes of a feature

What would the system predict if the values of an instance are changed?

PyExplainer

Note

Parts of this chapter have been published by Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Hoa Khanh Dam, John Grundy: An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models. IEEE Trans. Software Eng. (2020) https://doi.org/10.1109/TSE.2020.2982385

Suggested Readings

[1] Brian Y. Lim, Qian Yang, Ashraf M. Abdul, Danding Wang: Why these Explanations? Selecting Intelligibility Types for Explanation Goals. IUI Workshops 2019.

[2] David K. Lewis: Causal Explanation. Philosophical Papers 2: 214-240 (1986).

[3] Eric Barnes: Why P Rather than Q?: The Curiosities of Fact and Foil. Philosophical Studies 73: 35-53 (1995).

[4] Peter Lipton: Contrastive Explanation. Royal Institute of Philosophy Supplements 27:247-266 (1990).