ASE2021 PyExplainer Live-Demo¶
Explaining instances with PyExplainer is simple. First, we need to load necesarry libraries as well as preparing datasets.
## Load Data and preparing datasets # Import for Load Data from os import listdir from os.path import isfile, join import pandas as pd # Import for Split Data into Training and Testing Samples from sklearn.model_selection import train_test_split train_dataset = pd.read_csv(("../../datasets/lucene-2.9.0.csv"), index_col = 'File') test_dataset = pd.read_csv(("../../datasets/lucene-3.0.0.csv"), index_col = 'File') outcome = 'RealBug' features = ['OWN_COMMIT', 'CountClassCoupled', 'AvgLine', 'RatioCommentToCode'] # OWN_COMMIT - # code ownership # Added lines - # of added lines of code # Count class coupled - # of classes that interact or couple with the class of interest # RatioCommentToCode - The ratio of lines of comments to lines of code # process outcome to 0 and 1 train_dataset[outcome] = pd.Categorical(train_dataset[outcome]) train_dataset[outcome] = train_dataset[outcome].cat.codes test_dataset[outcome] = pd.Categorical(test_dataset[outcome]) test_dataset[outcome] = test_dataset[outcome].cat.codes X_train = train_dataset.loc[:, features] X_test = test_dataset.loc[:, features] y_train = train_dataset.loc[:, outcome] y_test = test_dataset.loc[:, outcome] class_labels = ['Clean', 'Defective'] X_train.columns = features X_test.columns = features training_data = pd.concat([X_train, y_train], axis=1) testing_data = pd.concat([X_test, y_test], axis=1)
Then, we construct a Random Forests model as a predictive model to be explained.
(1) Please construct a Random Forests model using the code cell below.
rf_model = RandomForestClassifier(random_state=0) rf_model.fit(X_train, y_train)
from sklearn.ensemble import RandomForestClassifier # Please fit your Random Forests model here!
PyExplainer [PTJ+21] is a rule-based model-agnostic technique that utilises a local rule-based regression model to learn the associations between the characteristics of the synthetic instances and the predictions from the black-box model. Given a black-box model and an instance to explain, PyExplainer performs four key steps to generate an instance explanation as follows:
First, PyExplainer generates synthetic neighbors around the instance to be explained using the crossover and mutation techniques
Second, PyExplainer obtains the predictions of the synthetic neighbors from the black-box model
Third, PyExplainer builds a local rule-based regression model
Finally, PyExplainer generates an explanation from the local model for the instance to be explained
import numpy as np np.random.seed(0) pyexp = PyExplainer(X_train = X_train, y_train = y_train, indep = X_train.columns, dep = outcome, blackbox_model = rf_model) # PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained exp_obj = pyexp.explain(X_explain = X_test.loc[file_to_be_explained,:].to_frame().transpose(), y_explain = pd.Series(bool(y_test.loc[file_to_be_explained]), index = [file_to_be_explained], name = outcome), search_function = 'crossoverinterpolation', max_iter=1000, max_rules=20, random_state=0, reuse_local_model=True) # Print rule exp_obj['top_k_positive_rules'][:1] # Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation). pyexp.visualise(exp_obj, title="Why this file is predicted as defect-introducing?")
# Import for PyExplainer from pyexplainer.pyexplainer_pyexplainer import PyExplainer file_to_be_explained = 'src/java/org/apache/lucene/index/DocumentsWriter.java' # PyExplainer Step 1 - Construct a PyExplainer # PyExplainer Step 2 - Generate the rule-based explanation of an instance to be explained # Print rule # Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).
# Print rule
# Please use the code below to visualise the generated PyExplainer explanation (What-If interactive visualisation).