Performance Estimation of Classification

Consider to download this Jupyter Notebook and run locally, or test it with Colab.

Download Open In Colab

  • In this notebook, we will show how to evaluate the performance of classification tasks.

  • We provide the model predicted classification results (network logits) for this tutorial, which will be download automatically. We also provide the model training code in https://github.com/ZerojumpLine/Robust-Skin-Lesion-Classification.

  • More specifically, we show an example of estimating the performance under domain shifts on CIFAR10-LT based on ResNet. We will utilize the calculated logits on test dataset with sythesized motion blur condition.

  • We will calculated model confidence with different confidence scores and varied calibration methods.

[1]:
!pip install moval
!pip install statannotations
!pip install pandas
!pip install tqdm
!pip install matplotlib
!pip install seaborn==0.12 # because statannotations not support the latest
Requirement already satisfied: moval in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.3.16)
Requirement already satisfied: scikit-learn>=1.3.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.3.0)
Requirement already satisfied: scipy>=1.8.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.10.1)
Requirement already satisfied: pytest in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (7.4.3)
Requirement already satisfied: gdown in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (4.7.1)
Requirement already satisfied: pandas in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.5.3)
Requirement already satisfied: nibabel in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (5.1.0)
Requirement already satisfied: numpy>=1.17.3 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (1.24.4)
Requirement already satisfied: joblib>=1.1.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (1.3.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (3.1.0)
Requirement already satisfied: filelock in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (3.13.1)
Requirement already satisfied: requests[socks] in /Users/zejuli/.local/lib/python3.8/site-packages (from gdown->moval) (2.31.0)
Requirement already satisfied: six in /Users/zejuli/.local/lib/python3.8/site-packages (from gdown->moval) (1.16.0)
Requirement already satisfied: tqdm in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (4.65.0)
Requirement already satisfied: beautifulsoup4 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (4.12.2)
Requirement already satisfied: importlib-resources>=1.3 in /Users/zejuli/.local/lib/python3.8/site-packages (from nibabel->moval) (5.12.0)
Requirement already satisfied: packaging>=17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from nibabel->moval) (23.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas->moval) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas->moval) (2023.3.post1)
Requirement already satisfied: iniconfig in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (2.0.0)
Requirement already satisfied: pluggy<2.0,>=0.12 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (1.3.0)
Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (1.1.3)
Requirement already satisfied: tomli>=1.0.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (2.0.1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=1.3->nibabel->moval) (3.15.0)
Requirement already satisfied: soupsieve>1.2 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from beautifulsoup4->gdown->moval) (2.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (2.0.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (2023.5.7)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (1.7.1)
Requirement already satisfied: statannotations in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.6.0)
Requirement already satisfied: numpy>=1.12.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.24.4)
Collecting seaborn<0.12,>=0.9.0 (from statannotations)
  Using cached seaborn-0.11.2-py3-none-any.whl (292 kB)
Requirement already satisfied: matplotlib>=2.2.2 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (3.7.4)
Requirement already satisfied: pandas<2.0.0,>=0.23.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.5.3)
Requirement already satisfied: scipy>=1.1.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.10.1)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (5.12.0)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas<2.0.0,>=0.23.0->statannotations) (2023.3.post1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib>=2.2.2->statannotations) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2.2->statannotations) (1.16.0)
Installing collected packages: seaborn
  Attempting uninstall: seaborn
    Found existing installation: seaborn 0.12.0
    Uninstalling seaborn-0.12.0:
      Successfully uninstalled seaborn-0.12.0
Successfully installed seaborn-0.11.2
Requirement already satisfied: pandas in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: numpy>=1.20.3 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (1.24.4)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Requirement already satisfied: tqdm in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (4.65.0)
Requirement already satisfied: matplotlib in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (3.7.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy<2,>=1.20 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.24.4)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib) (5.12.0)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Collecting seaborn==0.12
  Using cached seaborn-0.12.0-py3-none-any.whl (285 kB)
Requirement already satisfied: numpy>=1.17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (1.24.4)
Requirement already satisfied: pandas>=0.25 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (1.5.3)
Requirement already satisfied: matplotlib>=3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (3.7.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (5.12.0)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas>=0.25->seaborn==0.12) (2023.3.post1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib>=3.1->seaborn==0.12) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=3.1->seaborn==0.12) (1.16.0)
Installing collected packages: seaborn
  Attempting uninstall: seaborn
    Found existing installation: seaborn 0.11.2
    Uninstalling seaborn-0.11.2:
      Successfully uninstalled seaborn-0.11.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
statannotations 0.6.0 requires seaborn<0.12,>=0.9.0, but you have seaborn 0.12.0 which is incompatible.
Successfully installed seaborn-0.12.0
[2]:
import os
import gdown
import itertools
import zipfile
import pandas as pd
import numpy as np
import moval
from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt
[3]:
print(f"The installed MOVAL verision is {moval.__version__}")
print(f"The installed seaborn verision is {sns.__version__}")
The installed MOVAL verision is 0.3.16
The installed seaborn verision is 0.12.0

Load the data

[4]:
# download the data, which we used for MICCAI 2022

output = "data_moval.zip"
if not os.path.exists(output):
    url = "https://drive.google.com/u/0/uc?id=139pqxkG2ccIFq6qNArnFJWQ2by2Spbxt&export=download"
    output = "data_moval.zip"
    gdown.download(url, output, quiet=False)

directory_data = "data_moval"
if not os.path.exists(directory_data):
    with zipfile.ZipFile(output, 'r') as zip_ref:
        zip_ref.extractall(directory_data)
[5]:
ls
analysis_cls.ipynb    data_moval_supp.zip   img_cifar/
analysis_seg2d.ipynb  estim_cls.ipynb       img_cifar.zip
analysis_seg3d.ipynb  estim_seg2d.ipynb     img_prostate/
data_moval/           estim_seg3d.ipynb     img_prostate.zip
data_moval.zip        img_cardiac/
data_moval_supp/      img_cardiac.zip
[6]:
# now I am playing with cifar10 classification
val_data =  "data_moval/cifar10results/predictions_val.csv"
test_data = "data_moval/cifar10results/predictions_val_motion_blur.csv"
# validation data
cnn_pred = pd.read_csv(val_data)
targets_all = np.array(cnn_pred[['target_0', 'target_1', 'target_2', 'target_3', 'target_4',
                                 'target_5', 'target_6', 'target_7', 'target_8', 'target_9']])
logits = np.array(cnn_pred[['logit_0', 'logit_1', 'logit_2', 'logit_3', 'logit_4',
                               'logit_5', 'logit_6', 'logit_7', 'logit_8', 'logit_9']])
gt = np.argmax(targets_all, axis = 1)
# logits is of shape ``(n, d)``
# gt is of shape ``(n, )``

# test data
cnn_pred_test = pd.read_csv(test_data)
targets_all_test = np.array(cnn_pred_test[['target_0', 'target_1', 'target_2', 'target_3', 'target_4',
                                           'target_5', 'target_6', 'target_7', 'target_8', 'target_9']])
logits_test = np.array(cnn_pred_test[['logit_0', 'logit_1', 'logit_2', 'logit_3', 'logit_4',
                                      'logit_5', 'logit_6', 'logit_7', 'logit_8', 'logit_9']])
gt_test = np.argmax(targets_all_test, axis = 1)
[7]:
# Here I want to split the test data such that they are not overlapped.
import random
random.seed(79)
test_ind = list(range(10000))
random.shuffle(test_ind)
#
val_ind = test_ind[:3000]
testc_indx_1 = test_ind[3000:]
testc_indx_2 = [x+10000 for x in test_ind[3000:]]
testc_indx_3 = [x+20000 for x in test_ind[3000:]]
testc_indx_4 = [x+30000 for x in test_ind[3000:]]
testc_indx_5 = [x+40000 for x in test_ind[3000:]]
testc_indxs = [testc_indx_1, testc_indx_2, testc_indx_3, testc_indx_4, testc_indx_5]
#
[8]:
logits_val = logits[val_ind, :]
gt_val = gt[val_ind]
#
logits_tests = []
gt_tests = []
#
for testc_indx in testc_indxs:
    #
    logits_tests.append(logits_test[testc_indx, :])
    gt_tests.append(gt_test[testc_indx])
[9]:
print(f"The validation predictions, ``logits`` are of shape (n, d), which are now {logits_val.shape}")
print(f"The validation labels, ``gt`` are of shape (n, ), which are now {gt_val.shape}\n")
print(f"The number of test conditions is {len(logits_tests)}")
print(f"The test predictions, ``logits_test`` are of shape (n', d), which are now {logits_tests[0].shape}")
print(f"The test labels, ``gt_test`` are of shape (n', ), which are now {gt_tests[0].shape}")
The validation predictions, ``logits`` are of shape (n, d), which are now (3000, 10)
The validation labels, ``gt`` are of shape (n, ), which are now (3000,)

The number of test conditions is 5
The test predictions, ``logits_test`` are of shape (n', d), which are now (7000, 10)
The test labels, ``gt_test`` are of shape (n', ), which are now (7000,)

MOVAL estimation on accuracy

[10]:
moval_options = list(itertools.product(moval.models.get_estim_options(),
                               ["classification"],
                               moval.models.get_conf_options(),
                               [False, True]))
[11]:
# ac-model does not need class-speicfic variants
for moval_option in moval_options:
    if moval_option[0] == 'ac-model' and moval_option[-1] == True:
        moval_options.remove(moval_option)
[12]:
print(f"The number of moval options is {len(moval_options)}")
The number of moval options is 36
[13]:
from moval.solvers.utils import ComputMetric, ComputAUC
from moval.models.utils import cal_softmax

def test_cls(estim_algorithm, mode, metric, confidence_scores, class_specific, logits, gt, logits_tests, gt_tests):
    """Test MOVAL with different conditions for classification tasks

    Args:
        mode (str): The given task to estimate model performance.
        metrc (str): The metric to be estimated.
        confidence_scores (str):
            The method to calculate the confidence scores. We provide a list of confidence score calculation methods which
            can be displayed by running :py:func:`moval.models.get_conf_options`.
        estim_algorithm (str):
            The algorithm to estimate model performance. We also provide a list of estimation algorithm which can be displayed by
            running :py:func:`moval.models.get_estim_options`.
        class_specific (bool):
            If ``True``, the calculation will match class-wise confidence to class-wise accuracy.
        logits: The network output (logits) of shape ``(n, d)`` for classification.
        gt: The cooresponding annotation of shape ``(n, )`` for classification.
        logits_tests:  A list of m test conditions ``(n', d)``.
        gt_test: The cooresponding annotation of a list of m ``(n', )``.

    Returns:
        err_test: A list of m test err.
        moval_model: Optimized moval model.

    """

    moval_model = moval.MOVAL(
                mode = mode,
                metric = metric,
                confidence_scores = confidence_scores,
                estim_algorithm = estim_algorithm,
                class_specific = class_specific
                )

    #
    moval_model.fit(logits, gt)

    # save the test err in the result files.

    err_tests = []
    for k_test in range(len(logits_tests)):

        _logits_test = logits_tests[k_test]
        _gt_test = gt_tests[k_test]

        estim_acc_test = moval_model.estimate(_logits_test)

        pred_test = np.argmax(_logits_test, axis = 1)
        if metric == "accuracy":

            real_metric = np.sum(_gt_test == pred_test) / len(_gt_test)
        elif metric == "sensitivity":
            real_sensitivities = []
            for kcls in range(_logits_test.shape[1]):
                _, real_sensitivity, _ = ComputMetric(_gt_test == kcls, pred_test == kcls)
                real_sensitivities.append(real_sensitivity)
            real_metric = real_sensitivities
        elif metric == "precision":
            real_precisions = []
            for kcls in range(_logits_test.shape[1]):
                _, _, real_precision = ComputMetric(_gt_test == kcls, pred_test == kcls)
                real_precisions.append(real_precision)
            real_metric = real_precisions
        elif metric == "f1score":
            real_F1scores = []
            for kcls in range(_logits_test.shape[1]):
                real_F1score, _, _ = ComputMetric(_gt_test == kcls, pred_test == kcls)
                real_F1scores.append(real_F1score)
            real_metric = real_F1scores
        else:
            real_auc = ComputAUC(_gt_test, cal_softmax(_logits_test))
            real_metric = real_auc

        err_test = np.mean(np.abs(real_metric - estim_acc_test ))
        err_tests.append(err_test)

    return err_tests, moval_model
[14]:
err_test_list = []
moval_parameters = []
moval_parameters_ = []
[15]:
for k_cond in tqdm(range(len(moval_options))):

    err_test, moval_model = test_cls(
        estim_algorithm = moval_options[k_cond][0],
        mode = moval_options[k_cond][1],
        metric = "accuracy",
        confidence_scores = moval_options[k_cond][2],
        class_specific = moval_options[k_cond][3],
        logits = logits_val,
        gt = gt_val,
        logits_tests = logits_tests,
        gt_tests = gt_tests
    )
    err_test_list.append(err_test)
    moval_parameters.append(moval_model.model_.param)
    if moval_model.model_.extend_param:
        moval_parameters_.append(moval_model.model_.param_ext)
    else:
        moval_parameters_.append(0.)
  0%|                                                                                                                                                                                | 0/36 [00:00<?, ?it/s]
Starting optimizing for model ac-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...
  3%|████▋                                                                                                                                                                   | 1/36 [00:00<00:08,  4.05it/s]
Starting optimizing for model ac-model with confidence energy-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...
  6%|█████████▎                                                                                                                                                              | 2/36 [00:00<00:08,  4.11it/s]
Starting optimizing for model ac-model with confidence entropy-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...
  8%|██████████████                                                                                                                                                          | 3/36 [00:00<00:11,  2.92it/s]
Starting optimizing for model ac-model with confidence doctor-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...
 11%|██████████████████▋                                                                                                                                                     | 4/36 [00:01<00:10,  2.91it/s]
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 14%|███████████████████████▎                                                                                                                                                | 5/36 [00:01<00:10,  2.97it/s]
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 17%|████████████████████████████                                                                                                                                            | 6/36 [00:03<00:24,  1.21it/s]
Starting optimizing for model ts-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 19%|████████████████████████████████▋                                                                                                                                       | 7/36 [00:03<00:18,  1.56it/s]
Starting optimizing for model ts-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 22%|█████████████████████████████████████▎                                                                                                                                  | 8/36 [00:04<00:22,  1.26it/s]
Starting optimizing for model ts-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 25%|██████████████████████████████████████████                                                                                                                              | 9/36 [00:05<00:17,  1.51it/s]
Starting optimizing for model ts-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 28%|██████████████████████████████████████████████▍                                                                                                                        | 10/36 [00:07<00:34,  1.31s/it]
Starting optimizing for model ts-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 31%|███████████████████████████████████████████████████                                                                                                                    | 11/36 [00:08<00:25,  1.01s/it]
Starting optimizing for model ts-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 33%|███████████████████████████████████████████████████████▋                                                                                                               | 12/36 [00:10<00:30,  1.27s/it]
Starting optimizing for model doc-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 36%|████████████████████████████████████████████████████████████▎                                                                                                          | 13/36 [00:10<00:22,  1.02it/s]
Starting optimizing for model doc-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 39%|████████████████████████████████████████████████████████████████▉                                                                                                      | 14/36 [00:11<00:21,  1.01it/s]
Starting optimizing for model doc-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 42%|█████████████████████████████████████████████████████████████████████▌                                                                                                 | 15/36 [00:11<00:16,  1.28it/s]
Starting optimizing for model doc-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 44%|██████████████████████████████████████████████████████████████████████████▏                                                                                            | 16/36 [00:12<00:15,  1.28it/s]
Starting optimizing for model doc-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 47%|██████████████████████████████████████████████████████████████████████████████▊                                                                                        | 17/36 [00:12<00:12,  1.48it/s]
Starting optimizing for model doc-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 50%|███████████████████████████████████████████████████████████████████████████████████▌                                                                                   | 18/36 [00:14<00:14,  1.21it/s]
Starting optimizing for model doc-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 53%|████████████████████████████████████████████████████████████████████████████████████████▏                                                                              | 19/36 [00:14<00:11,  1.44it/s]
Starting optimizing for model doc-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 56%|████████████████████████████████████████████████████████████████████████████████████████████▊                                                                          | 20/36 [00:15<00:12,  1.26it/s]
Starting optimizing for model atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 58%|█████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                     | 21/36 [00:15<00:09,  1.55it/s]
Starting optimizing for model atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 61%|██████████████████████████████████████████████████████████████████████████████████████████████████████                                                                 | 22/36 [00:16<00:11,  1.25it/s]
Starting optimizing for model atc-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 64%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                            | 23/36 [00:17<00:08,  1.55it/s]
Starting optimizing for model atc-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param for class 0, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18728231]), array([0.26218443]), array([0.30525356]), array([0.34318131]), array([0.43485032])]
Optimization results are [(0.5186440669175524, 1.0), (8.790578265305271e-10, 0.2984811888595933), (8.790578265305271e-10, 0.2982347839920787), (8.790578265305271e-10, 0.29762222354380885), (8.790578265305271e-10, 0.29813876174077925), (8.790578265305271e-10, 0.29895959724557186)]
Not satisfied with initial optimization results of param for class 2, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.16370562]), array([0.24210566]), array([0.29369108]), array([0.33880298]), array([0.45093615])]
Optimization results are [(0.7068493131319197, 1.0), (1.936573457150814e-09, 0.20258569909358948), (1.936573457150814e-09, 0.20276349137935112), (1.936573457150814e-09, 0.2028304000812713), (1.936573457150814e-09, 0.2027524073036559), (1.936573457150814e-09, 0.20292126767235374)]
Not satisfied with initial optimization results of param for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18169108]), array([0.2639885]), array([0.31238295]), array([0.35001277]), array([0.45603095])]
Optimization results are [(0.8181818153210426, 1.0), (2.8607756163978593e-09, 0.1726065218441504), (2.8607756163978593e-09, 0.17159252717803666), (2.8607756163978593e-09, 0.17181062241447223), (2.8607756163978593e-09, 0.1706312240510795), (2.8607756163978593e-09, 0.17101160626313566)]
Not satisfied with initial optimization results of param for class 5, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.17036821]), array([0.25540432]), array([0.298793]), array([0.33816787]), array([0.4466694])]
Optimization results are [(0.7468354398867703, 1.0), (0.00421941243390489, 0.195923437958823), (0.00421941243390489, 0.19793835114336558), (3.151204408524677e-09, 0.20168527213894694), (0.00421941243390489, 0.1986736263788906), (3.151204408524677e-09, 0.20169915150554396)]
Not satisfied with initial optimization results of param for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18241121]), array([0.26778415]), array([0.29895517]), array([0.354317]), array([0.48995152])]
Optimization results are [(0.9244444403358025, 1.0), (4.1086418711699935e-09, 0.13680840954458334), (4.1086418711699935e-09, 0.13723937873654624), (4.1086418711699935e-09, 0.13639829535065073), (4.1086418711699935e-09, 0.13729783903544526), (0.004444440335802491, 0.14086106295146145)]
Not satisfied with initial optimization results of param for class 7, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.14946888]), array([0.24291015]), array([0.26496527]), array([0.3045747]), array([0.4565208])]
Optimization results are [(0.944444439197531, 1.0), (5.2469134592669775e-09, 0.09715477415120669), (5.2469134592669775e-09, 0.09716406187674936), (5.2469134592669775e-09, 0.0960499096150394), (5.2469134592669775e-09, 0.09898677614608604), (0.05555556080246904, 1e-06)]
Not satisfied with initial optimization results of param for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.22009101]), array([0.31972133]), array([0.35201452]), array([0.40703034]), array([0.50911339])]
Optimization results are [(0.9843749923095704, 1.0), (7.690429626450168e-09, 0.06602730240951293), (7.690429626450168e-09, 0.0639442662183366), (7.690429626450168e-09, 0.06600272232427074), (0.015625007690429626, 1e-06), (0.015625007690429626, 1e-06)]
Not satisfied with initial optimization results of param for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18890638]), array([0.30161541]), array([0.33930996]), array([0.39866314]), array([0.51144122])]
Optimization results are [(0.9827586122175982, 1.0), (0.008620698127229431, 0.07556255148086044), (8.47205694487485e-09, 0.0885995252774123), (0.017241387782401807, 1e-06), (0.017241387782401807, 1e-06), (0.017241387782401807, 1e-06)]
Calculating and saving the fitted case-wise performance...
 67%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                       | 24/36 [00:19<00:14,  1.22s/it]
Starting optimizing for model atc-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 69%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                   | 25/36 [00:20<00:10,  1.03it/s]
Starting optimizing for model atc-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                              | 26/36 [00:21<00:10,  1.09s/it]
Starting optimizing for model atc-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                         | 27/36 [00:21<00:07,  1.15it/s]
Starting optimizing for model atc-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 78%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                     | 28/36 [00:23<00:07,  1.04it/s]
Starting optimizing for model ts-atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 81%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                | 29/36 [00:23<00:05,  1.28it/s]
Starting optimizing for model ts-atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param_ext for class 0, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.33765525]), array([0.44007736]), array([0.50198903]), array([0.57228649]), array([0.7085548])]
Optimization results are [(0.5186440669175524, 1.0), (0.001694914375179546, 0.4832690701953513), (0.001694914375179546, 0.48408509061454946), (0.001694914375179546, 0.48316444533111014), (0.001694914375179546, 0.48286672341553727), (0.001694914375179546, 0.48270295899398286)]
Calculating and saving the fitted case-wise performance...
 83%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                           | 30/36 [00:26<00:08,  1.49s/it]
Starting optimizing for model ts-atc-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param_ext, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.6543765]), array([0.67402398]), array([0.68690314]), array([0.7001747]), array([0.74142047])]
Optimization results are [(0.29633333333333334, array([0.5])), (0.0, array([0.66373843])), (0.0, array([0.66372274])), (0.0, array([0.66372687])), (0.0, array([0.66373006])), (0.0, array([0.66373061]))]
Calculating and saving the fitted case-wise performance...
 86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                       | 31/36 [00:27<00:05,  1.18s/it]
Starting optimizing for model ts-atc-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param_ext for class 0, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.46778749]), array([0.49245172]), array([0.50598468]), array([0.5236424]), array([0.56292242])]
Optimization results are [(0.5186440669175524, 1.0), (8.790578265305271e-10, 0.5038765633948785), (8.790578265305271e-10, 0.5038011896738515), (8.790578265305271e-10, 0.5038105319039202), (8.790578265305271e-10, 0.5038012617738086), (8.790578265305271e-10, 0.5038815359533244)]
Not satisfied with initial optimization results of param_ext for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.78302131]), array([0.79694241]), array([0.80847999]), array([0.81915746]), array([0.84634618])]
Optimization results are [(0.18181818467895738, 0.6999999999999997), (2.8607756163978593e-09, 0.7810331667840638), (0.0034965006357279282, 0.781377130643169), (2.8607756163978593e-09, 0.781004302656523), (0.0034965006357279282, 0.7813994188641729), (0.0034965006357279282, 0.7812171928544589)]
Not satisfied with initial optimization results of param_ext for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.89461435]), array([0.90913531]), array([0.91746629]), array([0.92473949]), array([0.95260243])]
Optimization results are [(0.07555555966419747, 0.7999999999999998), (0.07555555966419747, 0.8498836336971195), (4.1086418711699935e-09, 0.887117190835369), (4.1086418711699935e-09, 0.8871827383358843), (4.1086418711699935e-09, 0.887171946148615), (4.1086418711699935e-09, 0.8871110145350087)]
Not satisfied with initial optimization results of param_ext for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.9615313]), array([0.98736813]), array([0.99534012]), array([1.]), array([1.])]
Optimization results are [(0.015625007690429626, 0.8999999999999999), (7.690429626450168e-09, 0.9374930193806559), (7.690429626450168e-09, 0.9379997191513918), (0.015625007690429626, 0.8958061037400822), (0.015625007690429626, 0.8999999999999999), (0.015625007690429626, 0.8999999999999999)]
Not satisfied with initial optimization results of param_ext for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.96108867]), array([0.98062035]), array([0.9891486]), array([0.99834737]), array([1.])]
Optimization results are [(0.017241387782401807, 0.8999999999999999), (0.017241387782401807, 0.91303423627121), (8.47205694487485e-09, 0.9453793070235122), (8.47205694487485e-09, 0.9451005767327609), (0.017241387782401807, 0.898512635896374), (0.017241387782401807, 0.8999999999999999)]
Calculating and saving the fitted case-wise performance...
 89%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                  | 32/36 [00:30<00:07,  1.80s/it]
Starting optimizing for model ts-atc-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████              | 33/36 [00:30<00:04,  1.40s/it]
Starting optimizing for model ts-atc-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋         | 34/36 [00:34<00:04,  2.17s/it]
Starting optimizing for model ts-atc-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎    | 35/36 [00:35<00:01,  1.64s/it]
Starting optimizing for model ts-atc-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [00:38<00:00,  1.06s/it]

Compare estimation results

[16]:
estim = []
conf = []
err = []
err_mean = []
novel = []
k_option = 0
for moval_option in moval_options:
    for k_cond in range(len(err_test_list[k_option])):
        #
        if moval_option[3] == True:
            estim_cs = 'CS '
        else:
            estim_cs = ''
        #
        if moval_option[0] == 'ac-model':
            estim.append(estim_cs + 'AC')
        elif moval_option[0] == 'ts-model':
            estim.append(estim_cs + 'TS')
        elif moval_option[0] == 'doc-model':
            estim.append(estim_cs + 'DoC')
        elif moval_option[0] == 'atc-model':
            estim.append(estim_cs + 'ATC')
        else:
            estim.append(estim_cs + 'TS-ATC')
        #
        if moval_option[2] == 'max_class_probability-conf':
            conf.append('MCP')
        elif moval_option[2] == 'energy-conf':
            conf.append('Energy')
        elif moval_option[2] == 'entropy-conf':
            conf.append('Entropy')
        else:
            conf.append('Doctor')
        #
        if moval_option[2] == 'max_class_probability-conf' and moval_option[3] == False:
            novel.append('Existing Methods')
        else:
            novel.append('Provided by MOVAL')
        #
        err.append(err_test_list[k_option][k_cond])
        err_mean.append(np.mean(err_test_list[k_option]))
    k_option += 1
[17]:
d = {'Estimation Algorithm': estim, 'Confidence Score': conf, 'MAE': err_mean, 'MAE ': err, 'Category': novel}
df = pd.DataFrame(data=d)
#
custom_order = ['AC', 'TS', 'DoC', 'ATC', 'TS-ATC', 'CS TS', 'CS DoC', 'CS ATC', 'CS TS-ATC']
df['Estimation Algorithm'] = pd.Categorical(df['Estimation Algorithm'], categories=custom_order, ordered=True)
df = df.sort_values(by='Estimation Algorithm')
#
custom_order = ['MCP', 'Doctor', 'Entropy', 'Energy']
df['Confidence Score'] = pd.Categorical(df['Confidence Score'], categories=custom_order, ordered=True)
df = df.sort_values(by='Confidence Score')
[18]:
df.head()
[18]:
Estimation Algorithm Confidence Score MAE MAE Category
0 AC MCP 0.401485 0.310355 Existing Methods
25 CS TS MCP 0.082242 0.047667 Provided by MOVAL
144 TS-ATC MCP 0.102657 0.129429 Existing Methods
143 TS-ATC MCP 0.102657 0.119571 Existing Methods
142 TS-ATC MCP 0.102657 0.107000 Existing Methods
[19]:
sns.set(rc={'figure.figsize':(6,3)})
sns.set_style("darkgrid")
category_palette = {'Existing Methods': 'grey', 'Provided by MOVAL': '#1f77b4'}
ax = sns.scatterplot(
    data=df, x="Estimation Algorithm", y="Confidence Score", hue="Category", size="MAE",
    sizes=(40, 1000), palette=category_palette
)
ax.set(ylim=(3.5, -0.5))
ax.tick_params(axis='x', rotation=15)
#
# Get the handles and labels from the legend
handles, labels = ax.get_legend_handles_labels()

# Create a custom legend with only desired categories
desired_labels = ['Category', 'Existing Methods', 'Provided by MOVAL', 'MAE', '0.08', '0.16']
desired_handles = [h for h, l in zip(handles, labels) if l in desired_labels]

legend = plt.legend(handles=desired_handles, labels=desired_labels, bbox_to_anchor=(1.2, 1), labelspacing=1)
../_images/demos_estim_cls_24_0.png
[20]:
from statannotations.Annotator import Annotator
sns.set(rc={'figure.figsize':(6,2)})
sns.set_style("white")
ax = sns.barplot(df, x="Estimation Algorithm", y="MAE", color = '#1f77b4')
ax.tick_params(axis='x', rotation=15)
#
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
#
pairs=[("TS", "CS TS"), ("DoC", "CS DoC"), ("ATC", "CS ATC"), ("TS-ATC", "CS TS-ATC")]

annotator = Annotator(ax, pairs, data=df, x="Estimation Algorithm", y="MAE")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside', text_offset=-4)
annotator.apply_and_annotate()
p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

TS vs. CS TS: Mann-Whitney-Wilcoxon test two-sided, P_val:1.974e-06 U_stat=3.750e+02
DoC vs. CS DoC: Mann-Whitney-Wilcoxon test two-sided, P_val:5.405e-08 U_stat=4.000e+02
ATC vs. CS ATC: Mann-Whitney-Wilcoxon test two-sided, P_val:5.405e-08 U_stat=4.000e+02
TS-ATC vs. CS TS-ATC: Mann-Whitney-Wilcoxon test two-sided, P_val:5.405e-08 U_stat=4.000e+02
[20]:
(<Axes: xlabel='Estimation Algorithm', ylabel='MAE'>,
 [<statannotations.Annotation.Annotation at 0x7f9e010abf70>,
  <statannotations.Annotation.Annotation at 0x7f9de0a4e640>,
  <statannotations.Annotation.Annotation at 0x7f9de0a4e670>,
  <statannotations.Annotation.Annotation at 0x7f9de0a4e6d0>])
../_images/demos_estim_cls_25_2.png
[21]:
sns.set(rc={'figure.figsize':(3,2)})
sns.set_style("white")
ax = sns.barplot(df, x="Confidence Score", y="MAE", color = '#1f77b4')
ax.tick_params(axis='x', rotation=15)
#
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
#
pairs=[("MCP", "Doctor"), ("MCP", "Entropy"), ("MCP", "Energy")]

annotator = Annotator(ax, pairs, data=df, x="Confidence Score", y="MAE")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()
p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

MCP vs. Doctor: Mann-Whitney-Wilcoxon test two-sided, P_val:6.163e-01 U_stat=1.075e+03
MCP vs. Entropy: Mann-Whitney-Wilcoxon test two-sided, P_val:2.682e-01 U_stat=1.150e+03
MCP vs. Energy: Mann-Whitney-Wilcoxon test two-sided, P_val:1.905e-01 U_stat=1.175e+03
[21]:
(<Axes: xlabel='Confidence Score', ylabel='MAE'>,
 [<statannotations.Annotation.Annotation at 0x7f9e010abdf0>,
  <statannotations.Annotation.Annotation at 0x7f9e010abee0>,
  <statannotations.Annotation.Annotation at 0x7f9e0109c490>])
../_images/demos_estim_cls_26_2.png
[22]:
sns.set(rc={'figure.figsize':(12,3)})
category_palette = {'MCP': '#e5f0f8',
                    'Doctor': '#99c6e4',
                    'Entropy': '#4c9cd0',
                    'Energy': '#0072bd'
                   }
ax = sns.boxplot(df, x="Estimation Algorithm", y="MAE ", hue="Confidence Score", palette=category_palette)
ax.set(ylim=(-0.02, 0.5))
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))

pairs=[
    [("ATC", "MCP"), ("CS TS-ATC", "Energy")]
]

annotator = Annotator(ax, pairs, data=df, x="Estimation Algorithm", y="MAE", hue="Confidence Score")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()
p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

ATC_MCP vs. CS TS-ATC_Energy: Mann-Whitney-Wilcoxon test two-sided, P_val:3.977e-03 U_stat=2.500e+01
[22]:
(<Axes: xlabel='Estimation Algorithm', ylabel='MAE '>,
 [<statannotations.Annotation.Annotation at 0x7f9e010f8d90>])
../_images/demos_estim_cls_27_2.png

Estimationg of other metrics

[23]:
test_conditions = moval_options[4:6]
[24]:
estimatation_metrics = ["accuracy", "sensitivity", "precision", "f1score", "auc"]
[25]:
err_test_list = []
moval_parameters = []
moval_parameters_ = []
[26]:
for k_cond in tqdm(range(len(test_conditions))):
    for estimatation_metric in estimatation_metrics:
        err_test, moval_model = test_cls(
            estim_algorithm = test_conditions[k_cond][0],
            mode = test_conditions[k_cond][1],
            metric = estimatation_metric,
            confidence_scores = test_conditions[k_cond][2],
            class_specific = test_conditions[k_cond][3],
            logits = logits_val,
            gt = gt_val,
            logits_tests = logits_tests,
            gt_tests = gt_tests
        )
        err_test_list.append(err_test)
        moval_parameters.append(moval_model.model_.param)
        if moval_model.model_.extend_param:
            moval_parameters_.append(moval_model.model_.param_ext)
        else:
            moval_parameters_.append(0.)
  0%|                                                                                                                                                                                 | 0/2 [00:00<?, ?it/s]
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric sensitivity, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric precision, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric auc, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
 50%|████████████████████████████████████████████████████████████████████████████████████▌                                                                                    | 1/2 [00:11<00:11, 11.68s/it]
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric sensitivity, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric precision, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06)]
Not satisfied with initial optimization results of param for class 5, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06)]
Not satisfied with initial optimization results of param for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06)]
Not satisfied with initial optimization results of param for class 7, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06)]
Not satisfied with initial optimization results of param for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06)]
Not satisfied with initial optimization results of param for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06)]
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric f1score, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric precision, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06)]
Not satisfied with initial optimization results of param for class 5, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06)]
Not satisfied with initial optimization results of param for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06)]
Not satisfied with initial optimization results of param for class 7, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06)]
Not satisfied with initial optimization results of param for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06)]
Not satisfied with initial optimization results of param for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06)]
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric auc, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:31<00:00, 15.83s/it]
[27]:
estim = []
conf = []
metric = []
err = []
err_mean = []
novel = []
k_option = 0
for moval_option in test_conditions:
    for estimatation_metric in estimatation_metrics:
        for k_cond in range(len(err_test_list[k_option])):
            #
            if moval_option[3] == True:
                estim_cs = 'CS '
            else:
                estim_cs = ''
            #
            if moval_option[0] == 'ac-model':
                estim.append(estim_cs + 'AC')
            elif moval_option[0] == 'ts-model':
                estim.append(estim_cs + 'TS')
            elif moval_option[0] == 'doc-model':
                estim.append(estim_cs + 'DoC')
            elif moval_option[0] == 'atc-model':
                estim.append(estim_cs + 'ATC')
            else:
                estim.append(estim_cs + 'TS-ATC')
            #
            metric.append(estimatation_metric)
            if moval_option[2] == 'max_class_probability-conf':
                conf.append('MCP')
            elif moval_option[2] == 'energy-conf':
                conf.append('Energy')
            elif moval_option[2] == 'entropy-conf':
                conf.append('Entropy')
            else:
                conf.append('Doctor')
            #
            if moval_option[2] == 'max_class_probability-conf' and moval_option[3] == False:
                novel.append('Existing Methods')
            else:
                novel.append('Provided by MOVAL')
            #
            err.append(err_test_list[k_option][k_cond])
            err_mean.append(np.mean(err_test_list[k_option]))
        k_option += 1
[28]:
d = {'Estimation Algorithm': estim, 'Confidence Score': conf, 'MAE': err_mean, 'MAE ': err, 'Category': novel, 'Metric': metric}
df = pd.DataFrame(data=d)
#
custom_order = ['accuracy', 'sensitivity', 'precision', 'f1score', 'auc']
df['Metric'] = pd.Categorical(df['Metric'], categories=custom_order, ordered=True)
df = df.sort_values(by='Metric')
[29]:
df
[29]:
Estimation Algorithm Confidence Score MAE MAE Category Metric
0 TS MCP 0.141721 0.080761 Existing Methods accuracy
1 TS MCP 0.141721 0.125595 Existing Methods accuracy
2 TS MCP 0.141721 0.156014 Existing Methods accuracy
3 TS MCP 0.141721 0.163744 Existing Methods accuracy
4 TS MCP 0.141721 0.182490 Existing Methods accuracy
28 CS TS MCP 0.082242 0.095681 Provided by MOVAL accuracy
27 CS TS MCP 0.082242 0.089596 Provided by MOVAL accuracy
26 CS TS MCP 0.082242 0.072217 Provided by MOVAL accuracy
25 CS TS MCP 0.082242 0.047667 Provided by MOVAL accuracy
29 CS TS MCP 0.082242 0.106050 Provided by MOVAL accuracy
31 CS TS MCP 0.152080 0.135770 Provided by MOVAL sensitivity
32 CS TS MCP 0.152080 0.165340 Provided by MOVAL sensitivity
33 CS TS MCP 0.152080 0.173940 Provided by MOVAL sensitivity
34 CS TS MCP 0.152080 0.191324 Provided by MOVAL sensitivity
30 CS TS MCP 0.152080 0.094026 Provided by MOVAL sensitivity
9 TS MCP 0.254894 0.270503 Existing Methods sensitivity
8 TS MCP 0.254894 0.268069 Existing Methods sensitivity
7 TS MCP 0.254894 0.260403 Existing Methods sensitivity
6 TS MCP 0.254894 0.253091 Existing Methods sensitivity
5 TS MCP 0.254894 0.222405 Existing Methods sensitivity
13 TS MCP 0.496530 0.524955 Existing Methods precision
39 CS TS MCP 0.433401 0.488963 Provided by MOVAL precision
38 CS TS MCP 0.433401 0.463441 Provided by MOVAL precision
37 CS TS MCP 0.433401 0.462986 Provided by MOVAL precision
36 CS TS MCP 0.433401 0.413138 Provided by MOVAL precision
35 CS TS MCP 0.433401 0.338476 Provided by MOVAL precision
12 TS MCP 0.496530 0.523681 Existing Methods precision
11 TS MCP 0.496530 0.470540 Existing Methods precision
10 TS MCP 0.496530 0.400400 Existing Methods precision
14 TS MCP 0.496530 0.563074 Existing Methods precision
40 CS TS MCP 0.185486 0.104676 Provided by MOVAL f1score
42 CS TS MCP 0.185486 0.205134 Provided by MOVAL f1score
44 CS TS MCP 0.185486 0.238782 Provided by MOVAL f1score
41 CS TS MCP 0.185486 0.166118 Provided by MOVAL f1score
19 TS MCP 0.291944 0.337603 Existing Methods f1score
18 TS MCP 0.291944 0.318459 Existing Methods f1score
17 TS MCP 0.291944 0.304015 Existing Methods f1score
16 TS MCP 0.291944 0.277608 Existing Methods f1score
15 TS MCP 0.291944 0.222035 Existing Methods f1score
43 CS TS MCP 0.185486 0.212721 Provided by MOVAL f1score
45 CS TS MCP 0.039400 0.017939 Provided by MOVAL auc
47 CS TS MCP 0.039400 0.043302 Provided by MOVAL auc
46 CS TS MCP 0.039400 0.026579 Provided by MOVAL auc
24 TS MCP 0.041647 0.063592 Existing Methods auc
23 TS MCP 0.041647 0.040151 Existing Methods auc
22 TS MCP 0.041647 0.058271 Existing Methods auc
21 TS MCP 0.041647 0.028687 Existing Methods auc
20 TS MCP 0.041647 0.017533 Existing Methods auc
48 CS TS MCP 0.039400 0.048245 Provided by MOVAL auc
49 CS TS MCP 0.039400 0.060935 Provided by MOVAL auc
[30]:
ax = sns.boxplot(df, x="Metric", y="MAE ", hue="Estimation Algorithm")
../_images/demos_estim_cls_36_0.png
[ ]:

[ ]:

[ ]:

[ ]: