ASE2021 Hands-on Exercise

Below are interactive hands-on exercises for model-agnostic techniques for generating local explanations. First, we need to load necesarry libraries as well as preparing datasets.

## Load Data and preparing datasets

# Import for Load Data
from os import listdir
from os.path import isfile, join
import pandas as pd

# Import for Split Data into Training and Testing Samples
from sklearn.model_selection import train_test_split

train_dataset = pd.read_csv(("../../datasets/lucene-2.9.0.csv"), index_col = 'File')
test_dataset = pd.read_csv(("../../datasets/lucene-3.0.0.csv"), index_col = 'File')

outcome = 'RealBug'
features = ['OWN_COMMIT', 'Added_lines', 'CountClassCoupled', 'AvgLine', 'RatioCommentToCode']
# commits - # of commits that modify the file of interest
# Added lines - # of added lines of code
# Count class coupled - # of classes that interact or couple with the class of interest
# LOC - # of lines of code
# RatioCommentToCode - The ratio of lines of comments to lines of code

# process outcome to 0 and 1
train_dataset[outcome] = pd.Categorical(train_dataset[outcome])
train_dataset[outcome] = train_dataset[outcome].cat.codes

test_dataset[outcome] = pd.Categorical(test_dataset[outcome])
test_dataset[outcome] = test_dataset[outcome].cat.codes

X_train = train_dataset.loc[:, features]
X_test = test_dataset.loc[:, features]

y_train = train_dataset.loc[:, outcome]
y_test = test_dataset.loc[:, outcome]

class_labels = ['Clean', 'Defective']

X_train.columns = features
X_test.columns = features
training_data = pd.concat([X_train, y_train], axis=1)
testing_data = pd.concat([X_test, y_test], axis=1)

Then, we construct a Random Forests model as a predictive model to be explained.

(1) Please construct a Random Forests model using the code cell below.

Tips


our_rf_model = RandomForestClassifier(random_state=0)
our_rf_model.fit(X_train, y_train)  

from sklearn.ensemble import RandomForestClassifier

# Please fit your Random Forests model here!

LIME

LIME (i.e., Local Interpretable Model-agnostic Explanations) [RSG16b] is a model-agnostic technique that mimics the behaviour of the black-box model to generate the explanations of the predictions of the black-box model. Given a black-box model and an instance to explain, LIME performs 4 key steps to generate an instance explanation as follows:

  • First, LIME randomly generates instances surrounding the instance of interest.

  • Second, LIME uses the black-box model to generate predictions of the generated random instances.

  • Third, LIME constructs a local regression model using the generated random instances and their generated predictions from the black-box model.

  • Finally, the coefficients of the regression model indicate the contribution of each metric on the prediction of the instance of interest according to the black-box model.

(2) Please use LIME to explain the prediction of DocumentsWriter.java that is generated from your Random Forests model.

Tips


# LIME Step 1 - Construct an explainer
our_lime_explainer = lime.lime_tabular.LimeTabularExplainer(
                            training_data = X_train.values,  
                            mode = 'classification',
                            training_labels = y_train,
                            feature_names = features,
                            class_names = class_labels,
                            discretize_continuous = True)
                            
# LIME Step 2 - Use the constructed explainer with the predict function 
# of your predictive model to explain any instance
lime_local_explanation_of_an_instance = lime_explainer.explain_instance(
                            data_row = X_test.loc['FileName.py', :], 
                            predict_fn = our_rf_model.predict_proba, 
                            num_features = 5,
                            top_labels = 1)
                            
# Please use the code below to visualise the generated LIME explanation.
lime_local_explanation_of_an_instance.show_in_notebook()

# Import for LIME
import lime
import lime.lime_tabular

file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'

print(f'Explaining {file_to_be_explained} with LIME')

# LIME Step 1 - Construct an explainer


# LIME Step 2 - Use the constructed explainer with the predict function of your predictive model to explain any instance


# visualise the generated LIME explanation
Explaining src/java/org/apache/lucene/index/DocumentsWriter.java with LIME

SHAP

SHAP (Shapley values) [LEL18] is a model-agnostic technique that generate the explanations of the black-box model based on game theory.

(2) Please use LIME to explain the prediction of DocumentsWriter.java that is generated from your Random Forests model.

Tips


# SHAP Step 1 - Construct an explainer with the predict function
# of your predictive model
our_shap_explainer = shap.KernelExplainer(our_rf_model.predict, X_test)
                            
# SHAP Step 2 - Generate the SHAP explanation of an instance to be explained
shap_explanations_of_an_instance = our_shap_explainer.shap_values(X_test.iloc[file_to_be_explained_idx, :])
                            
# Please use the code below to visualise the generated SHAP explanation (Force plot).
shap.initjs()
shap.force_plot(our_shap_explainer.expected_value, 
                shap_explanations_of_instances, 
                X_test.iloc[file_to_be_explained_idx,:])

# Import libraries for SHAP
import subprocess
import sys
import importlib
import numpy
import shap

file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'
file_to_be_explained_idx = list(X_test.index).index(file_to_be_explained)


# SHAP Step 1 - Construct an explainer with the predict function


# SHAP Step 2 - Generate the SHAP explanation of an instance to be explained


# visualise the generated SHAP explanation

PyExplainer

PyExplainer [PTJ+21] is a rule-based model-agnostic technique that utilises a local rule-based regression model to learn the associations between the characteristics of the synthetic instances and the predictions from the black-box model. Given a black-box model and an instance to explain, PyExplainer performs four key steps to generate an instance explanation as follows:

  • First, PyExplainer generates synthetic neighbors around the instance to be explained using the crossover and mutation techniques

  • Second, PyExplainer obtains the predictions of the synthetic neighbors from the black-box model

  • Third, PyExplainer builds a local rule-based regression model

  • Finally, PyExplainer generates an explanation from the local model for the instance to be explained

(3) Please use PyExplainer to explain the prediction of DocumentsWriter.java that is generated from your Random Forests model.

Tips

import numpy as np
np.random.seed(0)

# PyExplainer Step 1 - Construct a PyExplainer 
our_pyexplainer = PyExplainer(X_train = X_train,
                           y_train = y_train,
                           indep = X_train.columns,
                           dep = outcome,
                           blackbox_model = rf_model)
                            
# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained
pyexplainer_explanation_of_an_instance = our_pyexplainer.explain(
                                X_explain = X_test.loc[file_to_be_explained,:].to_frame().transpose(),
                                y_explain = pd.Series(bool(y_test.loc[file_to_be_explained]), 
                                                      index = [file_to_be_explained],
                                                      name = outcome),
                                search_function = 'crossoverinterpolation',
                                max_iter=1000,
                                max_rules=20,
                                random_state=0,
                                reuse_local_model=True)
                            
# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).
our_pyexplainer.visualise(pyexplainer_explanation_of_an_instance, title="Why this file is defect-introducing ?")

# Import for PyExplainer
from pyexplainer.pyexplainer_pyexplainer import PyExplainer

file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'

# PyExplainer Step 1 - Construct a PyExplainer 


# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained


# visualise the generated rule-based PyExplainer explanation

All of the above explanations are the property-contrast explanation within a file (https://xai4se.github.io/xai/theory-of-explanations.html). In fact, model-agnostic techniques can be used to generate other types of explanations, e.g., Object-contrast (i.e., the differences of explanations between two objects).

(4) Please use LIME to generate the object-contrast explanations between DocumentsWriter.java and TestStringIntern.java.

# Import for LIME
import lime
import lime.lime_tabular

file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'
another_file_to_be_explained = 'src/test/org/apache/lucene/util/TestStringIntern.java'

print(f'Generating the object-contrast explanations between {file_to_be_explained} and {another_file_to_be_explained} with LIME')

# LIME Step 1 - Construct an explainer


# LIME Step 2 - Use the constructed explainer with the predict function of your predictive model to explain the two instances


# visualise the generated LIME explanation - (DocumentsWriter.java)
Generating the object-contrast explanations between src/java/org/apache/lucene/index/DocumentsWriter.java and src/test/org/apache/lucene/util/TestStringIntern.java with LIME
# visualise the generated LIME explanation - (TestStringIntern.java)