{ "cells": [ { "cell_type": "markdown", "id": "a3e84c7f", "metadata": {}, "source": [ "# (Step 5) Model Ranking\n", "\n", "## Using a Non-Parametric Scott-Knott ESD Test For Multiple Comparison" ] }, { "cell_type": "markdown", "id": "164f6eff", "metadata": {}, "source": [ "The Non-Parametric ScottKnott ESD (NPSK) test is a multiple comparison approach that leverages a hierarchical clustering to partition the set of median values of techniques (e.g., medians of variable importance scores, medians of model performance) into statistically distinct groups with non-negligible difference. The Non-Parametric ScottKnott ESD (NPSK) does not require the assumptions of normal distributions, homogeneous distributions, and the minimum sample size. The mechanism of the Non-Parametric Scott-Knott ESD test is made up of 2 steps:\n", "\n", "*(Step 1) Find a partition that maximizes treatment medians between groups.* We begin by sorting the median value of the distributions. Then, we compute the Kruskal Chisq statistics to identify a partition that maximizes the median values between groups. The Kruskal Chisq test is a non-parametric test, which does not require data normality and data heterogeneity assumptions.\n", "\n", "*(Step 2) Splitting into two groups or merging into one group.* We analyze the magnitude of the difference for each pair for all of the treatment medians of the two groups. If there is any one pair of treatment medians of two groups are non-negligible, we split into two groups. Otherwise, we merge into one group. We use the Cliff $|\\delta|$ effect size to estimate the effect size of the difference between the two medians.\n", "\n", "To illustrate how the Non-Parametric ScottKnott ESD works in practice, we first prepare a training dataset; construct defect models using 6 classification techniques (i.e., Logistic Regression, Random Forests, C5.0 (Decision Tree), Neural Network, Gradient Boosting Machine, and eXtreme Gradient Boosting Tree); and evaluate them using the AUC measure with the 10-fold cross validation approach.\n", "We provide a code snippet below." ] }, { "cell_type": "code", "execution_count": 2, "id": "cf55a3f6", "metadata": { "tags": [ "remove-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n", "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n", "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[16:47:01] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", "[16:47:01] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", "[16:47:01] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n", "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[16:47:01] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", "[16:47:02] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n", "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[16:47:02] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", "[16:47:02] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n", "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[16:47:02] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", "[16:47:02] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n", "[16:47:02] WARNING: /Users/runner/miniforge3/conda-bld/xgboost-split_1635105055642/work/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/klainfo/miniforge3/envs/xaitools/lib/python3.9/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].\n", " warnings.warn(label_encoder_deprecation_msg, UserWarning)\n" ] }, { "data": { "text/html": [ "
\n", " | LR | \n", "RF | \n", "DT | \n", "NN | \n", "GBM | \n", "XGB | \n", "
---|---|---|---|---|---|---|
0 | \n", "0.805701 | \n", "0.575852 | \n", "0.488532 | \n", "0.746396 | \n", "0.662025 | \n", "0.566186 | \n", "
1 | \n", "0.830603 | \n", "0.817824 | \n", "0.578145 | \n", "0.677261 | \n", "0.861402 | \n", "0.753277 | \n", "
2 | \n", "0.848296 | \n", "0.797837 | \n", "0.596003 | \n", "0.859764 | \n", "0.800131 | \n", "0.776212 | \n", "
3 | \n", "0.677778 | \n", "0.710943 | \n", "0.617677 | \n", "0.594949 | \n", "0.699327 | \n", "0.728283 | \n", "
4 | \n", "0.856902 | \n", "0.864983 | \n", "0.672222 | \n", "0.619529 | \n", "0.895960 | \n", "0.817508 | \n", "
5 | \n", "0.732323 | \n", "0.761953 | \n", "0.510774 | \n", "0.622559 | \n", "0.784848 | \n", "0.711785 | \n", "
6 | \n", "0.724916 | \n", "0.809091 | \n", "0.525253 | \n", "0.772391 | \n", "0.794949 | \n", "0.745455 | \n", "
7 | \n", "0.712795 | \n", "0.628620 | \n", "0.511111 | \n", "0.622559 | \n", "0.606902 | \n", "0.605051 | \n", "
8 | \n", "0.878016 | \n", "0.752973 | \n", "0.569317 | \n", "0.868841 | \n", "0.741930 | \n", "0.629630 | \n", "
9 | \n", "0.646619 | \n", "0.614849 | \n", "0.534149 | \n", "0.671424 | \n", "0.635406 | \n", "0.603126 | \n", "