ASE2021 PyExplainer Live-Demo¶
Explaining instances with PyExplainer is simple. First, we need to load necesarry libraries as well as preparing datasets.
## Load Data and preparing datasets
# Import for Load Data
from os import listdir
from os.path import isfile, join
import pandas as pd
# Import for Split Data into Training and Testing Samples
from sklearn.model_selection import train_test_split
train_dataset = pd.read_csv(("../../datasets/lucene-2.9.0.csv"), index_col = 'File')
test_dataset = pd.read_csv(("../../datasets/lucene-3.0.0.csv"), index_col = 'File')
outcome = 'RealBug'
features = ['OWN_COMMIT', 'CountClassCoupled', 'AvgLine', 'RatioCommentToCode']
# OWN_COMMIT - # code ownership
# Added lines - # of added lines of code
# Count class coupled - # of classes that interact or couple with the class of interest
# RatioCommentToCode - The ratio of lines of comments to lines of code
# process outcome to 0 and 1
train_dataset[outcome] = pd.Categorical(train_dataset[outcome])
train_dataset[outcome] = train_dataset[outcome].cat.codes
test_dataset[outcome] = pd.Categorical(test_dataset[outcome])
test_dataset[outcome] = test_dataset[outcome].cat.codes
X_train = train_dataset.loc[:, features]
X_test = test_dataset.loc[:, features]
y_train = train_dataset.loc[:, outcome]
y_test = test_dataset.loc[:, outcome]
class_labels = ['Clean', 'Defective']
X_train.columns = features
X_test.columns = features
training_data = pd.concat([X_train, y_train], axis=1)
testing_data = pd.concat([X_test, y_test], axis=1)
Then, we construct a Random Forests model as a predictive model to be explained.
(1) Please construct a Random Forests model using the code cell below.
Tips
rf_model = RandomForestClassifier(random_state=0)
rf_model.fit(X_train, y_train)
from sklearn.ensemble import RandomForestClassifier
# Please fit your Random Forests model here!
PyExplainer¶
PyExplainer [PTJ+21] is a rule-based model-agnostic technique that utilises a local rule-based regression model to learn the associations between the characteristics of the synthetic instances and the predictions from the black-box model. Given a black-box model and an instance to explain, PyExplainer performs four key steps to generate an instance explanation as follows:
First, PyExplainer generates synthetic neighbors around the instance to be explained using the crossover and mutation techniques
Second, PyExplainer obtains the predictions of the synthetic neighbors from the black-box model
Third, PyExplainer builds a local rule-based regression model
Finally, PyExplainer generates an explanation from the local model for the instance to be explained
Tips
import numpy as np
np.random.seed(0)
pyexp = PyExplainer(X_train = X_train,
y_train = y_train,
indep = X_train.columns,
dep = outcome,
blackbox_model = rf_model)
# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained
exp_obj = pyexp.explain(X_explain = X_test.loc[file_to_be_explained,:].to_frame().transpose(),
y_explain = pd.Series(bool(y_test.loc[file_to_be_explained]),
index = [file_to_be_explained],
name = outcome),
search_function = 'crossoverinterpolation',
max_iter=1000,
max_rules=20,
random_state=0,
reuse_local_model=True)
# Print rule
exp_obj['top_k_positive_rules'][:1]
# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).
pyexp.visualise(exp_obj, title="Why this file is predicted as defect-introducing?")
# Import for PyExplainer
from pyexplainer.pyexplainer_pyexplainer import PyExplainer
file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java'
# PyExplainer Step 1 - Construct a PyExplainer
# PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained
# Print rule
# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).
# Print rule
# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).