Performance Estimation of Classification

Consider to download this Jupyter Notebook and run locally, or test it with Colab.

In this notebook, we will show how to evaluate the performance of classification tasks.
We provide the model predicted classification results (network logits) for this tutorial, which will be download automatically. We also provide the model training code in https://github.com/ZerojumpLine/Robust-Skin-Lesion-Classification.
More specifically, we show an example of estimating the performance under domain shifts on CIFAR10-LT based on ResNet. We will utilize the calculated logits on test dataset with sythesized motion blur condition.
We will calculated model confidence with different confidence scores and varied calibration methods.

[1]:

!pip install moval
!pip install statannotations
!pip install pandas
!pip install tqdm
!pip install matplotlib
!pip install seaborn==0.12 # because statannotations not support the latest

Requirement already satisfied: moval in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.3.16)
Requirement already satisfied: scikit-learn>=1.3.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.3.0)
Requirement already satisfied: scipy>=1.8.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.10.1)
Requirement already satisfied: pytest in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (7.4.3)
Requirement already satisfied: gdown in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (4.7.1)
Requirement already satisfied: pandas in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.5.3)
Requirement already satisfied: nibabel in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (5.1.0)
Requirement already satisfied: numpy>=1.17.3 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (1.24.4)
Requirement already satisfied: joblib>=1.1.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (1.3.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (3.1.0)
Requirement already satisfied: filelock in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (3.13.1)
Requirement already satisfied: requests[socks] in /Users/zejuli/.local/lib/python3.8/site-packages (from gdown->moval) (2.31.0)
Requirement already satisfied: six in /Users/zejuli/.local/lib/python3.8/site-packages (from gdown->moval) (1.16.0)
Requirement already satisfied: tqdm in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (4.65.0)
Requirement already satisfied: beautifulsoup4 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (4.12.2)
Requirement already satisfied: importlib-resources>=1.3 in /Users/zejuli/.local/lib/python3.8/site-packages (from nibabel->moval) (5.12.0)
Requirement already satisfied: packaging>=17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from nibabel->moval) (23.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas->moval) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas->moval) (2023.3.post1)
Requirement already satisfied: iniconfig in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (2.0.0)
Requirement already satisfied: pluggy<2.0,>=0.12 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (1.3.0)
Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (1.1.3)
Requirement already satisfied: tomli>=1.0.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (2.0.1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=1.3->nibabel->moval) (3.15.0)
Requirement already satisfied: soupsieve>1.2 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from beautifulsoup4->gdown->moval) (2.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (2.0.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (2023.5.7)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (1.7.1)
Requirement already satisfied: statannotations in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.6.0)
Requirement already satisfied: numpy>=1.12.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.24.4)
Collecting seaborn<0.12,>=0.9.0 (from statannotations)
  Using cached seaborn-0.11.2-py3-none-any.whl (292 kB)
Requirement already satisfied: matplotlib>=2.2.2 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (3.7.4)
Requirement already satisfied: pandas<2.0.0,>=0.23.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.5.3)
Requirement already satisfied: scipy>=1.1.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.10.1)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (5.12.0)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas<2.0.0,>=0.23.0->statannotations) (2023.3.post1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib>=2.2.2->statannotations) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2.2->statannotations) (1.16.0)
Installing collected packages: seaborn
  Attempting uninstall: seaborn
    Found existing installation: seaborn 0.12.0
    Uninstalling seaborn-0.12.0:
      Successfully uninstalled seaborn-0.12.0
Successfully installed seaborn-0.11.2
Requirement already satisfied: pandas in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: numpy>=1.20.3 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (1.24.4)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Requirement already satisfied: tqdm in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (4.65.0)
Requirement already satisfied: matplotlib in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (3.7.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy<2,>=1.20 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.24.4)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib) (5.12.0)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Collecting seaborn==0.12
  Using cached seaborn-0.12.0-py3-none-any.whl (285 kB)
Requirement already satisfied: numpy>=1.17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (1.24.4)
Requirement already satisfied: pandas>=0.25 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (1.5.3)
Requirement already satisfied: matplotlib>=3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (3.7.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (5.12.0)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas>=0.25->seaborn==0.12) (2023.3.post1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib>=3.1->seaborn==0.12) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=3.1->seaborn==0.12) (1.16.0)
Installing collected packages: seaborn
  Attempting uninstall: seaborn
    Found existing installation: seaborn 0.11.2
    Uninstalling seaborn-0.11.2:
      Successfully uninstalled seaborn-0.11.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
statannotations 0.6.0 requires seaborn<0.12,>=0.9.0, but you have seaborn 0.12.0 which is incompatible.
Successfully installed seaborn-0.12.0

[2]:

import os
import gdown
import itertools
import zipfile
import pandas as pd
import numpy as np
import moval
from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt

[3]:

print(f"The installed MOVAL verision is {moval.__version__}")
print(f"The installed seaborn verision is {sns.__version__}")

The installed MOVAL verision is 0.3.16
The installed seaborn verision is 0.12.0

Load the data

[4]:

# download the data, which we used for MICCAI 2022

output = "data_moval.zip"
if not os.path.exists(output):
    url = "https://drive.google.com/u/0/uc?id=139pqxkG2ccIFq6qNArnFJWQ2by2Spbxt&export=download"
    output = "data_moval.zip"
    gdown.download(url, output, quiet=False)

directory_data = "data_moval"
if not os.path.exists(directory_data):
    with zipfile.ZipFile(output, 'r') as zip_ref:
        zip_ref.extractall(directory_data)

[5]:

ls

analysis_cls.ipynb    data_moval_supp.zip   img_cifar/
analysis_seg2d.ipynb  estim_cls.ipynb       img_cifar.zip
analysis_seg3d.ipynb  estim_seg2d.ipynb     img_prostate/
data_moval/           estim_seg3d.ipynb     img_prostate.zip
data_moval.zip        img_cardiac/
data_moval_supp/      img_cardiac.zip

[6]:

# now I am playing with cifar10 classification
val_data =  "data_moval/cifar10results/predictions_val.csv"
test_data = "data_moval/cifar10results/predictions_val_motion_blur.csv"
# validation data
cnn_pred = pd.read_csv(val_data)
targets_all = np.array(cnn_pred[['target_0', 'target_1', 'target_2', 'target_3', 'target_4',
                                 'target_5', 'target_6', 'target_7', 'target_8', 'target_9']])
logits = np.array(cnn_pred[['logit_0', 'logit_1', 'logit_2', 'logit_3', 'logit_4',
                               'logit_5', 'logit_6', 'logit_7', 'logit_8', 'logit_9']])
gt = np.argmax(targets_all, axis = 1)
# logits is of shape ``(n, d)``
# gt is of shape ``(n, )``

# test data
cnn_pred_test = pd.read_csv(test_data)
targets_all_test = np.array(cnn_pred_test[['target_0', 'target_1', 'target_2', 'target_3', 'target_4',
                                           'target_5', 'target_6', 'target_7', 'target_8', 'target_9']])
logits_test = np.array(cnn_pred_test[['logit_0', 'logit_1', 'logit_2', 'logit_3', 'logit_4',
                                      'logit_5', 'logit_6', 'logit_7', 'logit_8', 'logit_9']])
gt_test = np.argmax(targets_all_test, axis = 1)

[7]:

# Here I want to split the test data such that they are not overlapped.
import random
random.seed(79)
test_ind = list(range(10000))
random.shuffle(test_ind)
#
val_ind = test_ind[:3000]
testc_indx_1 = test_ind[3000:]
testc_indx_2 = [x+10000 for x in test_ind[3000:]]
testc_indx_3 = [x+20000 for x in test_ind[3000:]]
testc_indx_4 = [x+30000 for x in test_ind[3000:]]
testc_indx_5 = [x+40000 for x in test_ind[3000:]]
testc_indxs = [testc_indx_1, testc_indx_2, testc_indx_3, testc_indx_4, testc_indx_5]
#

[8]:

logits_val = logits[val_ind, :]
gt_val = gt[val_ind]
#
logits_tests = []
gt_tests = []
#
for testc_indx in testc_indxs:
    #
    logits_tests.append(logits_test[testc_indx, :])
    gt_tests.append(gt_test[testc_indx])

[9]:

print(f"The validation predictions, ``logits`` are of shape (n, d), which are now {logits_val.shape}")
print(f"The validation labels, ``gt`` are of shape (n, ), which are now {gt_val.shape}\n")
print(f"The number of test conditions is {len(logits_tests)}")
print(f"The test predictions, ``logits_test`` are of shape (n', d), which are now {logits_tests[0].shape}")
print(f"The test labels, ``gt_test`` are of shape (n', ), which are now {gt_tests[0].shape}")

The validation predictions, ``logits`` are of shape (n, d), which are now (3000, 10)
The validation labels, ``gt`` are of shape (n, ), which are now (3000,)

The number of test conditions is 5
The test predictions, ``logits_test`` are of shape (n', d), which are now (7000, 10)
The test labels, ``gt_test`` are of shape (n', ), which are now (7000,)

MOVAL estimation on accuracy

[10]:

moval_options = list(itertools.product(moval.models.get_estim_options(),
                               ["classification"],
                               moval.models.get_conf_options(),
                               [False, True]))

[11]:

# ac-model does not need class-speicfic variants
for moval_option in moval_options:
    if moval_option[0] == 'ac-model' and moval_option[-1] == True:
        moval_options.remove(moval_option)

[12]:

print(f"The number of moval options is {len(moval_options)}")

The number of moval options is 36

[13]:

from moval.solvers.utils import ComputMetric, ComputAUC
from moval.models.utils import cal_softmax

def test_cls(estim_algorithm, mode, metric, confidence_scores, class_specific, logits, gt, logits_tests, gt_tests):
    """Test MOVAL with different conditions for classification tasks

    Args:
        mode (str): The given task to estimate model performance.
        metrc (str): The metric to be estimated.
        confidence_scores (str):
            The method to calculate the confidence scores. We provide a list of confidence score calculation methods which
            can be displayed by running :py:func:`moval.models.get_conf_options`.
        estim_algorithm (str):
            The algorithm to estimate model performance. We also provide a list of estimation algorithm which can be displayed by
            running :py:func:`moval.models.get_estim_options`.
        class_specific (bool):
            If ``True``, the calculation will match class-wise confidence to class-wise accuracy.
        logits: The network output (logits) of shape ``(n, d)`` for classification.
        gt: The cooresponding annotation of shape ``(n, )`` for classification.
        logits_tests:  A list of m test conditions ``(n', d)``.
        gt_test: The cooresponding annotation of a list of m ``(n', )``.

    Returns:
        err_test: A list of m test err.
        moval_model: Optimized moval model.

    """

    moval_model = moval.MOVAL(
                mode = mode,
                metric = metric,
                confidence_scores = confidence_scores,
                estim_algorithm = estim_algorithm,
                class_specific = class_specific
                )

    #
    moval_model.fit(logits, gt)

    # save the test err in the result files.

    err_tests = []
    for k_test in range(len(logits_tests)):

        _logits_test = logits_tests[k_test]
        _gt_test = gt_tests[k_test]

        estim_acc_test = moval_model.estimate(_logits_test)

        pred_test = np.argmax(_logits_test, axis = 1)
        if metric == "accuracy":

            real_metric = np.sum(_gt_test == pred_test) / len(_gt_test)
        elif metric == "sensitivity":
            real_sensitivities = []
            for kcls in range(_logits_test.shape[1]):
                _, real_sensitivity, _ = ComputMetric(_gt_test == kcls, pred_test == kcls)
                real_sensitivities.append(real_sensitivity)
            real_metric = real_sensitivities
        elif metric == "precision":
            real_precisions = []
            for kcls in range(_logits_test.shape[1]):
                _, _, real_precision = ComputMetric(_gt_test == kcls, pred_test == kcls)
                real_precisions.append(real_precision)
            real_metric = real_precisions
        elif metric == "f1score":
            real_F1scores = []
            for kcls in range(_logits_test.shape[1]):
                real_F1score, _, _ = ComputMetric(_gt_test == kcls, pred_test == kcls)
                real_F1scores.append(real_F1score)
            real_metric = real_F1scores
        else:
            real_auc = ComputAUC(_gt_test, cal_softmax(_logits_test))
            real_metric = real_auc

        err_test = np.mean(np.abs(real_metric - estim_acc_test ))
        err_tests.append(err_test)

    return err_tests, moval_model

[14]:

err_test_list = []
moval_parameters = []
moval_parameters_ = []

[15]:

for k_cond in tqdm(range(len(moval_options))):

    err_test, moval_model = test_cls(
        estim_algorithm = moval_options[k_cond][0],
        mode = moval_options[k_cond][1],
        metric = "accuracy",
        confidence_scores = moval_options[k_cond][2],
        class_specific = moval_options[k_cond][3],
        logits = logits_val,
        gt = gt_val,
        logits_tests = logits_tests,
        gt_tests = gt_tests
    )
    err_test_list.append(err_test)
    moval_parameters.append(moval_model.model_.param)
    if moval_model.model_.extend_param:
        moval_parameters_.append(moval_model.model_.param_ext)
    else:
        moval_parameters_.append(0.)

  0%|                                                                                                                                                                                | 0/36 [00:00<?, ?it/s]

Starting optimizing for model ac-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...

  3%|████▋                                                                                                                                                                   | 1/36 [00:00<00:08,  4.05it/s]

Starting optimizing for model ac-model with confidence energy-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...

  6%|█████████▎                                                                                                                                                              | 2/36 [00:00<00:08,  4.11it/s]

Starting optimizing for model ac-model with confidence entropy-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...

  8%|██████████████                                                                                                                                                          | 3/36 [00:00<00:11,  2.92it/s]

Starting optimizing for model ac-model with confidence doctor-conf based on metric accuracy, class specific is False.
Calculating and saving the fitted case-wise performance...

 11%|██████████████████▋                                                                                                                                                     | 4/36 [00:01<00:10,  2.91it/s]

Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 14%|███████████████████████▎                                                                                                                                                | 5/36 [00:01<00:10,  2.97it/s]

Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 17%|████████████████████████████                                                                                                                                            | 6/36 [00:03<00:24,  1.21it/s]

Starting optimizing for model ts-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 19%|████████████████████████████████▋                                                                                                                                       | 7/36 [00:03<00:18,  1.56it/s]

Starting optimizing for model ts-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 22%|█████████████████████████████████████▎                                                                                                                                  | 8/36 [00:04<00:22,  1.26it/s]

Starting optimizing for model ts-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 25%|██████████████████████████████████████████                                                                                                                              | 9/36 [00:05<00:17,  1.51it/s]

Starting optimizing for model ts-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 28%|██████████████████████████████████████████████▍                                                                                                                        | 10/36 [00:07<00:34,  1.31s/it]

Starting optimizing for model ts-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 31%|███████████████████████████████████████████████████                                                                                                                    | 11/36 [00:08<00:25,  1.01s/it]

Starting optimizing for model ts-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 33%|███████████████████████████████████████████████████████▋                                                                                                               | 12/36 [00:10<00:30,  1.27s/it]

Starting optimizing for model doc-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 36%|████████████████████████████████████████████████████████████▎                                                                                                          | 13/36 [00:10<00:22,  1.02it/s]

Starting optimizing for model doc-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 39%|████████████████████████████████████████████████████████████████▉                                                                                                      | 14/36 [00:11<00:21,  1.01it/s]

Starting optimizing for model doc-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 42%|█████████████████████████████████████████████████████████████████████▌                                                                                                 | 15/36 [00:11<00:16,  1.28it/s]

Starting optimizing for model doc-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 44%|██████████████████████████████████████████████████████████████████████████▏                                                                                            | 16/36 [00:12<00:15,  1.28it/s]

Starting optimizing for model doc-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 47%|██████████████████████████████████████████████████████████████████████████████▊                                                                                        | 17/36 [00:12<00:12,  1.48it/s]

Starting optimizing for model doc-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 50%|███████████████████████████████████████████████████████████████████████████████████▌                                                                                   | 18/36 [00:14<00:14,  1.21it/s]

Starting optimizing for model doc-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 53%|████████████████████████████████████████████████████████████████████████████████████████▏                                                                              | 19/36 [00:14<00:11,  1.44it/s]

Starting optimizing for model doc-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 56%|████████████████████████████████████████████████████████████████████████████████████████████▊                                                                          | 20/36 [00:15<00:12,  1.26it/s]

Starting optimizing for model atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 58%|█████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                     | 21/36 [00:15<00:09,  1.55it/s]

Starting optimizing for model atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 61%|██████████████████████████████████████████████████████████████████████████████████████████████████████                                                                 | 22/36 [00:16<00:11,  1.25it/s]

Starting optimizing for model atc-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 64%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                            | 23/36 [00:17<00:08,  1.55it/s]

Starting optimizing for model atc-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param for class 0, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18728231]), array([0.26218443]), array([0.30525356]), array([0.34318131]), array([0.43485032])]
Optimization results are [(0.5186440669175524, 1.0), (8.790578265305271e-10, 0.2984811888595933), (8.790578265305271e-10, 0.2982347839920787), (8.790578265305271e-10, 0.29762222354380885), (8.790578265305271e-10, 0.29813876174077925), (8.790578265305271e-10, 0.29895959724557186)]
Not satisfied with initial optimization results of param for class 2, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.16370562]), array([0.24210566]), array([0.29369108]), array([0.33880298]), array([0.45093615])]
Optimization results are [(0.7068493131319197, 1.0), (1.936573457150814e-09, 0.20258569909358948), (1.936573457150814e-09, 0.20276349137935112), (1.936573457150814e-09, 0.2028304000812713), (1.936573457150814e-09, 0.2027524073036559), (1.936573457150814e-09, 0.20292126767235374)]
Not satisfied with initial optimization results of param for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18169108]), array([0.2639885]), array([0.31238295]), array([0.35001277]), array([0.45603095])]
Optimization results are [(0.8181818153210426, 1.0), (2.8607756163978593e-09, 0.1726065218441504), (2.8607756163978593e-09, 0.17159252717803666), (2.8607756163978593e-09, 0.17181062241447223), (2.8607756163978593e-09, 0.1706312240510795), (2.8607756163978593e-09, 0.17101160626313566)]
Not satisfied with initial optimization results of param for class 5, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.17036821]), array([0.25540432]), array([0.298793]), array([0.33816787]), array([0.4466694])]
Optimization results are [(0.7468354398867703, 1.0), (0.00421941243390489, 0.195923437958823), (0.00421941243390489, 0.19793835114336558), (3.151204408524677e-09, 0.20168527213894694), (0.00421941243390489, 0.1986736263788906), (3.151204408524677e-09, 0.20169915150554396)]
Not satisfied with initial optimization results of param for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18241121]), array([0.26778415]), array([0.29895517]), array([0.354317]), array([0.48995152])]
Optimization results are [(0.9244444403358025, 1.0), (4.1086418711699935e-09, 0.13680840954458334), (4.1086418711699935e-09, 0.13723937873654624), (4.1086418711699935e-09, 0.13639829535065073), (4.1086418711699935e-09, 0.13729783903544526), (0.004444440335802491, 0.14086106295146145)]
Not satisfied with initial optimization results of param for class 7, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.14946888]), array([0.24291015]), array([0.26496527]), array([0.3045747]), array([0.4565208])]
Optimization results are [(0.944444439197531, 1.0), (5.2469134592669775e-09, 0.09715477415120669), (5.2469134592669775e-09, 0.09716406187674936), (5.2469134592669775e-09, 0.0960499096150394), (5.2469134592669775e-09, 0.09898677614608604), (0.05555556080246904, 1e-06)]
Not satisfied with initial optimization results of param for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.22009101]), array([0.31972133]), array([0.35201452]), array([0.40703034]), array([0.50911339])]
Optimization results are [(0.9843749923095704, 1.0), (7.690429626450168e-09, 0.06602730240951293), (7.690429626450168e-09, 0.0639442662183366), (7.690429626450168e-09, 0.06600272232427074), (0.015625007690429626, 1e-06), (0.015625007690429626, 1e-06)]
Not satisfied with initial optimization results of param for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.18890638]), array([0.30161541]), array([0.33930996]), array([0.39866314]), array([0.51144122])]
Optimization results are [(0.9827586122175982, 1.0), (0.008620698127229431, 0.07556255148086044), (8.47205694487485e-09, 0.0885995252774123), (0.017241387782401807, 1e-06), (0.017241387782401807, 1e-06), (0.017241387782401807, 1e-06)]
Calculating and saving the fitted case-wise performance...

 67%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                       | 24/36 [00:19<00:14,  1.22s/it]

Starting optimizing for model atc-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 69%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                   | 25/36 [00:20<00:10,  1.03it/s]

Starting optimizing for model atc-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                              | 26/36 [00:21<00:10,  1.09s/it]

Starting optimizing for model atc-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                         | 27/36 [00:21<00:07,  1.15it/s]

Starting optimizing for model atc-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 78%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                     | 28/36 [00:23<00:07,  1.04it/s]

Starting optimizing for model ts-atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 81%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                | 29/36 [00:23<00:05,  1.28it/s]

Starting optimizing for model ts-atc-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param_ext for class 0, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.33765525]), array([0.44007736]), array([0.50198903]), array([0.57228649]), array([0.7085548])]
Optimization results are [(0.5186440669175524, 1.0), (0.001694914375179546, 0.4832690701953513), (0.001694914375179546, 0.48408509061454946), (0.001694914375179546, 0.48316444533111014), (0.001694914375179546, 0.48286672341553727), (0.001694914375179546, 0.48270295899398286)]
Calculating and saving the fitted case-wise performance...

 83%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                           | 30/36 [00:26<00:08,  1.49s/it]

Starting optimizing for model ts-atc-model with confidence energy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param_ext, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.6543765]), array([0.67402398]), array([0.68690314]), array([0.7001747]), array([0.74142047])]
Optimization results are [(0.29633333333333334, array([0.5])), (0.0, array([0.66373843])), (0.0, array([0.66372274])), (0.0, array([0.66372687])), (0.0, array([0.66373006])), (0.0, array([0.66373061]))]
Calculating and saving the fitted case-wise performance...

 86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                       | 31/36 [00:27<00:05,  1.18s/it]

Starting optimizing for model ts-atc-model with confidence energy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param_ext for class 0, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.46778749]), array([0.49245172]), array([0.50598468]), array([0.5236424]), array([0.56292242])]
Optimization results are [(0.5186440669175524, 1.0), (8.790578265305271e-10, 0.5038765633948785), (8.790578265305271e-10, 0.5038011896738515), (8.790578265305271e-10, 0.5038105319039202), (8.790578265305271e-10, 0.5038012617738086), (8.790578265305271e-10, 0.5038815359533244)]
Not satisfied with initial optimization results of param_ext for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.78302131]), array([0.79694241]), array([0.80847999]), array([0.81915746]), array([0.84634618])]
Optimization results are [(0.18181818467895738, 0.6999999999999997), (2.8607756163978593e-09, 0.7810331667840638), (0.0034965006357279282, 0.781377130643169), (2.8607756163978593e-09, 0.781004302656523), (0.0034965006357279282, 0.7813994188641729), (0.0034965006357279282, 0.7812171928544589)]
Not satisfied with initial optimization results of param_ext for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.89461435]), array([0.90913531]), array([0.91746629]), array([0.92473949]), array([0.95260243])]
Optimization results are [(0.07555555966419747, 0.7999999999999998), (0.07555555966419747, 0.8498836336971195), (4.1086418711699935e-09, 0.887117190835369), (4.1086418711699935e-09, 0.8871827383358843), (4.1086418711699935e-09, 0.887171946148615), (4.1086418711699935e-09, 0.8871110145350087)]
Not satisfied with initial optimization results of param_ext for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.9615313]), array([0.98736813]), array([0.99534012]), array([1.]), array([1.])]
Optimization results are [(0.015625007690429626, 0.8999999999999999), (7.690429626450168e-09, 0.9374930193806559), (7.690429626450168e-09, 0.9379997191513918), (0.015625007690429626, 0.8958061037400822), (0.015625007690429626, 0.8999999999999999), (0.015625007690429626, 0.8999999999999999)]
Not satisfied with initial optimization results of param_ext for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.96108867]), array([0.98062035]), array([0.9891486]), array([0.99834737]), array([1.])]
Optimization results are [(0.017241387782401807, 0.8999999999999999), (0.017241387782401807, 0.91303423627121), (8.47205694487485e-09, 0.9453793070235122), (8.47205694487485e-09, 0.9451005767327609), (0.017241387782401807, 0.898512635896374), (0.017241387782401807, 0.8999999999999999)]
Calculating and saving the fitted case-wise performance...

 89%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                  | 32/36 [00:30<00:07,  1.80s/it]

Starting optimizing for model ts-atc-model with confidence entropy-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████              | 33/36 [00:30<00:04,  1.40s/it]

Starting optimizing for model ts-atc-model with confidence entropy-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋         | 34/36 [00:34<00:04,  2.17s/it]

Starting optimizing for model ts-atc-model with confidence doctor-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎    | 35/36 [00:35<00:01,  1.64s/it]

Starting optimizing for model ts-atc-model with confidence doctor-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [00:38<00:00,  1.06s/it]

Compare estimation results

[16]:

estim = []
conf = []
err = []
err_mean = []
novel = []
k_option = 0
for moval_option in moval_options:
    for k_cond in range(len(err_test_list[k_option])):
        #
        if moval_option[3] == True:
            estim_cs = 'CS '
        else:
            estim_cs = ''
        #
        if moval_option[0] == 'ac-model':
            estim.append(estim_cs + 'AC')
        elif moval_option[0] == 'ts-model':
            estim.append(estim_cs + 'TS')
        elif moval_option[0] == 'doc-model':
            estim.append(estim_cs + 'DoC')
        elif moval_option[0] == 'atc-model':
            estim.append(estim_cs + 'ATC')
        else:
            estim.append(estim_cs + 'TS-ATC')
        #
        if moval_option[2] == 'max_class_probability-conf':
            conf.append('MCP')
        elif moval_option[2] == 'energy-conf':
            conf.append('Energy')
        elif moval_option[2] == 'entropy-conf':
            conf.append('Entropy')
        else:
            conf.append('Doctor')
        #
        if moval_option[2] == 'max_class_probability-conf' and moval_option[3] == False:
            novel.append('Existing Methods')
        else:
            novel.append('Provided by MOVAL')
        #
        err.append(err_test_list[k_option][k_cond])
        err_mean.append(np.mean(err_test_list[k_option]))
    k_option += 1

[17]:

d = {'Estimation Algorithm': estim, 'Confidence Score': conf, 'MAE': err_mean, 'MAE ': err, 'Category': novel}
df = pd.DataFrame(data=d)
#
custom_order = ['AC', 'TS', 'DoC', 'ATC', 'TS-ATC', 'CS TS', 'CS DoC', 'CS ATC', 'CS TS-ATC']
df['Estimation Algorithm'] = pd.Categorical(df['Estimation Algorithm'], categories=custom_order, ordered=True)
df = df.sort_values(by='Estimation Algorithm')
#
custom_order = ['MCP', 'Doctor', 'Entropy', 'Energy']
df['Confidence Score'] = pd.Categorical(df['Confidence Score'], categories=custom_order, ordered=True)
df = df.sort_values(by='Confidence Score')

[18]:

df.head()

[18]:

	Estimation Algorithm	Confidence Score	MAE	MAE	Category
0	AC	MCP	0.401485	0.310355	Existing Methods
25	CS TS	MCP	0.082242	0.047667	Provided by MOVAL
144	TS-ATC	MCP	0.102657	0.129429	Existing Methods
143	TS-ATC	MCP	0.102657	0.119571	Existing Methods
142	TS-ATC	MCP	0.102657	0.107000	Existing Methods

[19]:

sns.set(rc={'figure.figsize':(6,3)})
sns.set_style("darkgrid")
category_palette = {'Existing Methods': 'grey', 'Provided by MOVAL': '#1f77b4'}
ax = sns.scatterplot(
    data=df, x="Estimation Algorithm", y="Confidence Score", hue="Category", size="MAE",
    sizes=(40, 1000), palette=category_palette
)
ax.set(ylim=(3.5, -0.5))
ax.tick_params(axis='x', rotation=15)
#
# Get the handles and labels from the legend
handles, labels = ax.get_legend_handles_labels()

# Create a custom legend with only desired categories
desired_labels = ['Category', 'Existing Methods', 'Provided by MOVAL', 'MAE', '0.08', '0.16']
desired_handles = [h for h, l in zip(handles, labels) if l in desired_labels]

legend = plt.legend(handles=desired_handles, labels=desired_labels, bbox_to_anchor=(1.2, 1), labelspacing=1)

[20]:

from statannotations.Annotator import Annotator
sns.set(rc={'figure.figsize':(6,2)})
sns.set_style("white")
ax = sns.barplot(df, x="Estimation Algorithm", y="MAE", color = '#1f77b4')
ax.tick_params(axis='x', rotation=15)
#
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
#
pairs=[("TS", "CS TS"), ("DoC", "CS DoC"), ("ATC", "CS ATC"), ("TS-ATC", "CS TS-ATC")]

annotator = Annotator(ax, pairs, data=df, x="Estimation Algorithm", y="MAE")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside', text_offset=-4)
annotator.apply_and_annotate()

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

TS vs. CS TS: Mann-Whitney-Wilcoxon test two-sided, P_val:1.974e-06 U_stat=3.750e+02
DoC vs. CS DoC: Mann-Whitney-Wilcoxon test two-sided, P_val:5.405e-08 U_stat=4.000e+02
ATC vs. CS ATC: Mann-Whitney-Wilcoxon test two-sided, P_val:5.405e-08 U_stat=4.000e+02
TS-ATC vs. CS TS-ATC: Mann-Whitney-Wilcoxon test two-sided, P_val:5.405e-08 U_stat=4.000e+02

[20]:

(<Axes: xlabel='Estimation Algorithm', ylabel='MAE'>,
 [<statannotations.Annotation.Annotation at 0x7f9e010abf70>,
  <statannotations.Annotation.Annotation at 0x7f9de0a4e640>,
  <statannotations.Annotation.Annotation at 0x7f9de0a4e670>,
  <statannotations.Annotation.Annotation at 0x7f9de0a4e6d0>])

[21]:

sns.set(rc={'figure.figsize':(3,2)})
sns.set_style("white")
ax = sns.barplot(df, x="Confidence Score", y="MAE", color = '#1f77b4')
ax.tick_params(axis='x', rotation=15)
#
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
#
pairs=[("MCP", "Doctor"), ("MCP", "Entropy"), ("MCP", "Energy")]

annotator = Annotator(ax, pairs, data=df, x="Confidence Score", y="MAE")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

MCP vs. Doctor: Mann-Whitney-Wilcoxon test two-sided, P_val:6.163e-01 U_stat=1.075e+03
MCP vs. Entropy: Mann-Whitney-Wilcoxon test two-sided, P_val:2.682e-01 U_stat=1.150e+03
MCP vs. Energy: Mann-Whitney-Wilcoxon test two-sided, P_val:1.905e-01 U_stat=1.175e+03

[21]:

(<Axes: xlabel='Confidence Score', ylabel='MAE'>,
 [<statannotations.Annotation.Annotation at 0x7f9e010abdf0>,
  <statannotations.Annotation.Annotation at 0x7f9e010abee0>,
  <statannotations.Annotation.Annotation at 0x7f9e0109c490>])

[22]:

sns.set(rc={'figure.figsize':(12,3)})
category_palette = {'MCP': '#e5f0f8',
                    'Doctor': '#99c6e4',
                    'Entropy': '#4c9cd0',
                    'Energy': '#0072bd'
                   }
ax = sns.boxplot(df, x="Estimation Algorithm", y="MAE ", hue="Confidence Score", palette=category_palette)
ax.set(ylim=(-0.02, 0.5))
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))

pairs=[
    [("ATC", "MCP"), ("CS TS-ATC", "Energy")]
]

annotator = Annotator(ax, pairs, data=df, x="Estimation Algorithm", y="MAE", hue="Confidence Score")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

ATC_MCP vs. CS TS-ATC_Energy: Mann-Whitney-Wilcoxon test two-sided, P_val:3.977e-03 U_stat=2.500e+01

[22]:

(<Axes: xlabel='Estimation Algorithm', ylabel='MAE '>,
 [<statannotations.Annotation.Annotation at 0x7f9e010f8d90>])

Estimationg of other metrics

[23]:

test_conditions = moval_options[4:6]

[24]:

estimatation_metrics = ["accuracy", "sensitivity", "precision", "f1score", "auc"]

[25]:

err_test_list = []
moval_parameters = []
moval_parameters_ = []

[26]:

for k_cond in tqdm(range(len(test_conditions))):
    for estimatation_metric in estimatation_metrics:
        err_test, moval_model = test_cls(
            estim_algorithm = test_conditions[k_cond][0],
            mode = test_conditions[k_cond][1],
            metric = estimatation_metric,
            confidence_scores = test_conditions[k_cond][2],
            class_specific = test_conditions[k_cond][3],
            logits = logits_val,
            gt = gt_val,
            logits_tests = logits_tests,
            gt_tests = gt_tests
        )
        err_test_list.append(err_test)
        moval_parameters.append(moval_model.model_.param)
        if moval_model.model_.extend_param:
            moval_parameters_.append(moval_model.model_.param_ext)
        else:
            moval_parameters_.append(0.)

  0%|                                                                                                                                                                                 | 0/2 [00:00<?, ?it/s]

Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric sensitivity, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric precision, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric auc, class specific is False.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

 50%|████████████████████████████████████████████████████████████████████████████████████▌                                                                                    | 1/2 [00:11<00:11, 11.68s/it]

Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric accuracy, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric sensitivity, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric precision, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06)]
Not satisfied with initial optimization results of param for class 5, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06)]
Not satisfied with initial optimization results of param for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06)]
Not satisfied with initial optimization results of param for class 7, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06)]
Not satisfied with initial optimization results of param for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06)]
Not satisfied with initial optimization results of param for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06)]
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric f1score, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric precision, class specific is True.
Opitimizing with 3000 samples...
Not satisfied with initial optimization results of param for class 4, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06), (0.14782850502042233, 1e-06)]
Not satisfied with initial optimization results of param for class 5, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06), (0.1134772948903291, 1e-06)]
Not satisfied with initial optimization results of param for class 6, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06), (0.29976566175420816, 1e-06)]
Not satisfied with initial optimization results of param for class 7, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06), (0.3700501052526505, 1e-06)]
Not satisfied with initial optimization results of param for class 9, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06), (0.492084781380318, 1e-06)]
Not satisfied with initial optimization results of param for class 8, trying more initial states...
Tried 1/5 times.
Tried 2/5 times.
Tried 3/5 times.
Tried 4/5 times.
Tried 5/5 times.
Starting from [array([0.1]), array([0.5]), array([3.]), array([5.]), array([10.])]
Optimization results are [(0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06), (0.512844725524447, 1e-06)]
Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric auc, class specific is True.
Opitimizing with 3000 samples...
Calculating and saving the fitted case-wise performance...

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:31<00:00, 15.83s/it]

[27]:

estim = []
conf = []
metric = []
err = []
err_mean = []
novel = []
k_option = 0
for moval_option in test_conditions:
    for estimatation_metric in estimatation_metrics:
        for k_cond in range(len(err_test_list[k_option])):
            #
            if moval_option[3] == True:
                estim_cs = 'CS '
            else:
                estim_cs = ''
            #
            if moval_option[0] == 'ac-model':
                estim.append(estim_cs + 'AC')
            elif moval_option[0] == 'ts-model':
                estim.append(estim_cs + 'TS')
            elif moval_option[0] == 'doc-model':
                estim.append(estim_cs + 'DoC')
            elif moval_option[0] == 'atc-model':
                estim.append(estim_cs + 'ATC')
            else:
                estim.append(estim_cs + 'TS-ATC')
            #
            metric.append(estimatation_metric)
            if moval_option[2] == 'max_class_probability-conf':
                conf.append('MCP')
            elif moval_option[2] == 'energy-conf':
                conf.append('Energy')
            elif moval_option[2] == 'entropy-conf':
                conf.append('Entropy')
            else:
                conf.append('Doctor')
            #
            if moval_option[2] == 'max_class_probability-conf' and moval_option[3] == False:
                novel.append('Existing Methods')
            else:
                novel.append('Provided by MOVAL')
            #
            err.append(err_test_list[k_option][k_cond])
            err_mean.append(np.mean(err_test_list[k_option]))
        k_option += 1

[28]:

d = {'Estimation Algorithm': estim, 'Confidence Score': conf, 'MAE': err_mean, 'MAE ': err, 'Category': novel, 'Metric': metric}
df = pd.DataFrame(data=d)
#
custom_order = ['accuracy', 'sensitivity', 'precision', 'f1score', 'auc']
df['Metric'] = pd.Categorical(df['Metric'], categories=custom_order, ordered=True)
df = df.sort_values(by='Metric')

[29]:

df

[29]:

	Estimation Algorithm	Confidence Score	MAE	MAE	Category	Metric
0	TS	MCP	0.141721	0.080761	Existing Methods	accuracy
1	TS	MCP	0.141721	0.125595	Existing Methods	accuracy
2	TS	MCP	0.141721	0.156014	Existing Methods	accuracy
3	TS	MCP	0.141721	0.163744	Existing Methods	accuracy
4	TS	MCP	0.141721	0.182490	Existing Methods	accuracy
28	CS TS	MCP	0.082242	0.095681	Provided by MOVAL	accuracy
27	CS TS	MCP	0.082242	0.089596	Provided by MOVAL	accuracy
26	CS TS	MCP	0.082242	0.072217	Provided by MOVAL	accuracy
25	CS TS	MCP	0.082242	0.047667	Provided by MOVAL	accuracy
29	CS TS	MCP	0.082242	0.106050	Provided by MOVAL	accuracy
31	CS TS	MCP	0.152080	0.135770	Provided by MOVAL	sensitivity
32	CS TS	MCP	0.152080	0.165340	Provided by MOVAL	sensitivity
33	CS TS	MCP	0.152080	0.173940	Provided by MOVAL	sensitivity
34	CS TS	MCP	0.152080	0.191324	Provided by MOVAL	sensitivity
30	CS TS	MCP	0.152080	0.094026	Provided by MOVAL	sensitivity
9	TS	MCP	0.254894	0.270503	Existing Methods	sensitivity
8	TS	MCP	0.254894	0.268069	Existing Methods	sensitivity
7	TS	MCP	0.254894	0.260403	Existing Methods	sensitivity
6	TS	MCP	0.254894	0.253091	Existing Methods	sensitivity
5	TS	MCP	0.254894	0.222405	Existing Methods	sensitivity
13	TS	MCP	0.496530	0.524955	Existing Methods	precision
39	CS TS	MCP	0.433401	0.488963	Provided by MOVAL	precision
38	CS TS	MCP	0.433401	0.463441	Provided by MOVAL	precision
37	CS TS	MCP	0.433401	0.462986	Provided by MOVAL	precision
36	CS TS	MCP	0.433401	0.413138	Provided by MOVAL	precision
35	CS TS	MCP	0.433401	0.338476	Provided by MOVAL	precision
12	TS	MCP	0.496530	0.523681	Existing Methods	precision
11	TS	MCP	0.496530	0.470540	Existing Methods	precision
10	TS	MCP	0.496530	0.400400	Existing Methods	precision
14	TS	MCP	0.496530	0.563074	Existing Methods	precision
40	CS TS	MCP	0.185486	0.104676	Provided by MOVAL	f1score
42	CS TS	MCP	0.185486	0.205134	Provided by MOVAL	f1score
44	CS TS	MCP	0.185486	0.238782	Provided by MOVAL	f1score
41	CS TS	MCP	0.185486	0.166118	Provided by MOVAL	f1score
19	TS	MCP	0.291944	0.337603	Existing Methods	f1score
18	TS	MCP	0.291944	0.318459	Existing Methods	f1score
17	TS	MCP	0.291944	0.304015	Existing Methods	f1score
16	TS	MCP	0.291944	0.277608	Existing Methods	f1score
15	TS	MCP	0.291944	0.222035	Existing Methods	f1score
43	CS TS	MCP	0.185486	0.212721	Provided by MOVAL	f1score
45	CS TS	MCP	0.039400	0.017939	Provided by MOVAL	auc
47	CS TS	MCP	0.039400	0.043302	Provided by MOVAL	auc
46	CS TS	MCP	0.039400	0.026579	Provided by MOVAL	auc
24	TS	MCP	0.041647	0.063592	Existing Methods	auc
23	TS	MCP	0.041647	0.040151	Existing Methods	auc
22	TS	MCP	0.041647	0.058271	Existing Methods	auc
21	TS	MCP	0.041647	0.028687	Existing Methods	auc
20	TS	MCP	0.041647	0.017533	Existing Methods	auc
48	CS TS	MCP	0.039400	0.048245	Provided by MOVAL	auc
49	CS TS	MCP	0.039400	0.060935	Provided by MOVAL	auc

[30]:

ax = sns.boxplot(df, x="Metric", y="MAE ", hue="Estimation Algorithm")

[ ]:

[ ]:

[ ]:

[ ]: