Performance Estimation of 3D Segmentation

Consider to download this Jupyter Notebook and run locally, or test it with Colab.

In this notebook, we will show how to evaluate the performance of 3D segmentation tasks.
We provide the model predicted 3D segmentation results (network logits) for this tutorial, which will be download automatically. We also provide the model training code in https://github.com/ZerojumpLine/Robust-Medical-Segmentation.
More specifically, we show an example of estimating the performance under domain shifts on Prostate MRI segmentation (into 2 classes including background and prostate) based on a 3D U-Net. We will utilize the calculated logits on test dataset acquired with a different scanner.
We will calculated model confidence with different confidence scores and varied calibration methods.

[1]:

!pip install moval
!pip install statannotations
!pip install pandas
!pip install tqdm
!pip install matplotlib
!pip install nibabel
!pip install seaborn==0.12 # because statannotations not support the latest

Requirement already satisfied: moval in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.3.16)
Requirement already satisfied: scikit-learn>=1.3.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.3.0)
Requirement already satisfied: scipy>=1.8.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.10.1)
Requirement already satisfied: pytest in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (7.4.3)
Requirement already satisfied: gdown in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (4.7.1)
Requirement already satisfied: pandas in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (1.5.3)
Requirement already satisfied: nibabel in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from moval) (5.1.0)
Requirement already satisfied: numpy>=1.17.3 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (1.24.4)
Requirement already satisfied: joblib>=1.1.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (1.3.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from scikit-learn>=1.3.0->moval) (3.1.0)
Requirement already satisfied: filelock in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (3.13.1)
Requirement already satisfied: requests[socks] in /Users/zejuli/.local/lib/python3.8/site-packages (from gdown->moval) (2.31.0)
Requirement already satisfied: six in /Users/zejuli/.local/lib/python3.8/site-packages (from gdown->moval) (1.16.0)
Requirement already satisfied: tqdm in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (4.65.0)
Requirement already satisfied: beautifulsoup4 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from gdown->moval) (4.12.2)
Requirement already satisfied: importlib-resources>=1.3 in /Users/zejuli/.local/lib/python3.8/site-packages (from nibabel->moval) (5.12.0)
Requirement already satisfied: packaging>=17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from nibabel->moval) (23.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas->moval) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas->moval) (2023.3.post1)
Requirement already satisfied: iniconfig in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (2.0.0)
Requirement already satisfied: pluggy<2.0,>=0.12 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (1.3.0)
Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (1.1.3)
Requirement already satisfied: tomli>=1.0.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pytest->moval) (2.0.1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=1.3->nibabel->moval) (3.15.0)
Requirement already satisfied: soupsieve>1.2 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from beautifulsoup4->gdown->moval) (2.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (2.0.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/zejuli/.local/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (2023.5.7)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from requests[socks]->gdown->moval) (1.7.1)
Requirement already satisfied: statannotations in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.6.0)
Requirement already satisfied: numpy>=1.12.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.24.4)
Collecting seaborn<0.12,>=0.9.0 (from statannotations)
  Using cached seaborn-0.11.2-py3-none-any.whl (292 kB)
Requirement already satisfied: matplotlib>=2.2.2 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (3.7.4)
Requirement already satisfied: pandas<2.0.0,>=0.23.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.5.3)
Requirement already satisfied: scipy>=1.1.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from statannotations) (1.10.1)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib>=2.2.2->statannotations) (5.12.0)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas<2.0.0,>=0.23.0->statannotations) (2023.3.post1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib>=2.2.2->statannotations) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2.2->statannotations) (1.16.0)
Installing collected packages: seaborn
  Attempting uninstall: seaborn
    Found existing installation: seaborn 0.12.0
    Uninstalling seaborn-0.12.0:
      Successfully uninstalled seaborn-0.12.0
Successfully installed seaborn-0.11.2
Requirement already satisfied: pandas in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: numpy>=1.20.3 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas) (1.24.4)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Requirement already satisfied: tqdm in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (4.65.0)
WARNING: Error parsing requirements for seaborn: [Errno 2] No such file or directory: '/Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages/seaborn-0.11.2.dist-info/METADATA'
Requirement already satisfied: matplotlib in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (3.7.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy<2,>=1.20 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (1.24.4)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib) (5.12.0)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Requirement already satisfied: nibabel in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (5.1.0)
Requirement already satisfied: importlib-resources>=1.3 in /Users/zejuli/.local/lib/python3.8/site-packages (from nibabel) (5.12.0)
Requirement already satisfied: numpy>=1.19 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from nibabel) (1.24.4)
Requirement already satisfied: packaging>=17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from nibabel) (23.1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=1.3->nibabel) (3.15.0)
Requirement already satisfied: seaborn==0.12 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (0.12.0)
Requirement already satisfied: numpy>=1.17 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (1.24.4)
Requirement already satisfied: pandas>=0.25 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (1.5.3)
Requirement already satisfied: matplotlib>=3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from seaborn==0.12) (3.7.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (2.8.2)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from matplotlib>=3.1->seaborn==0.12) (5.12.0)
Requirement already satisfied: pytz>=2020.1 in /Users/zejuli/opt/anaconda3/envs/moval/lib/python3.8/site-packages (from pandas>=0.25->seaborn==0.12) (2023.3.post1)
Requirement already satisfied: zipp>=3.1.0 in /Users/zejuli/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib>=3.1->seaborn==0.12) (3.15.0)
Requirement already satisfied: six>=1.5 in /Users/zejuli/.local/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=3.1->seaborn==0.12) (1.16.0)

[2]:

import os
import gdown
import itertools
import zipfile
import pandas as pd
import numpy as np
import nibabel as nib
import moval
from moval.solvers.utils import ComputMetric
from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt

[3]:

print(f"The installed MOVAL verision is {moval.__version__}")
print(f"The installed seaborn verision is {sns.__version__}")

The installed MOVAL verision is 0.3.16
The installed seaborn verision is 0.12.0

Load the data

[4]:

# download the data, which we used for MICCAI 2022

output = "data_moval.zip"
if not os.path.exists(output):
    url = "https://drive.google.com/u/0/uc?id=139pqxkG2ccIFq6qNArnFJWQ2by2Spbxt&export=download"
    output = "data_moval.zip"
    gdown.download(url, output, quiet=False)

directory_data = "data_moval"
if not os.path.exists(directory_data):
    with zipfile.ZipFile(output, 'r') as zip_ref:
        zip_ref.extractall(directory_data)

[5]:

ls

analysis_cls.ipynb    data_moval_supp.zip   img_cifar/
analysis_seg2d.ipynb  estim_cls.ipynb       img_cifar.zip
analysis_seg3d.ipynb  estim_seg2d.ipynb     img_prostate/
data_moval/           estim_seg3d.ipynb     img_prostate.zip
data_moval.zip        img_cardiac/
data_moval_supp/      img_cardiac.zip

[6]:

# now I am playing with prostate segmentation
Datafile_eval = "data_moval/Prostateresults/seg-eval.txt"
Imglist_eval = open(Datafile_eval)
Imglist_eval_read = Imglist_eval.read().splitlines()

logits = []
gt = []
# to accelerate the debugging speed, crop the middel 60 x 60 x 30 cub for training/inference.
for Imgname_eval in Imglist_eval_read:
    GT_file = Imgname_eval.replace("data", "data_moval")
    caseID = Imgname_eval.split("/")[-1][:6]
    logit_cls0_file = "data_moval/Prostateresults/prostateval/results/pred_" + caseID + "cls0_prob.nii.gz"
    logit_cls1_file = "data_moval/Prostateresults/prostateval/results/pred_" + caseID + "cls1_prob.nii.gz"
    logit_cls0_read = nib.load(logit_cls0_file)
    logit_cls1_read = nib.load(logit_cls1_file)
    logit_cls0      = logit_cls0_read.get_fdata()   # ``(H, W, D)``
    logit_cls1      = logit_cls1_read.get_fdata()
    GT_read         = nib.load(GT_file)
    GTimg           = GT_read.get_fdata()           # ``(H, W, D)``
    logit_cls0      = logit_cls0[logit_cls0.shape[0] //2 - 30: logit_cls0.shape[0] //2 + 30,
                                 logit_cls0.shape[1] //2 - 30: logit_cls0.shape[1] //2 + 30,
                                 logit_cls0.shape[2] //2 - 15: logit_cls0.shape[2] //2 + 15]
    logit_cls1      = logit_cls1[logit_cls1.shape[0] //2 - 30: logit_cls1.shape[0] //2 + 30,
                                 logit_cls1.shape[1] //2 - 30: logit_cls1.shape[1] //2 + 30,
                                 logit_cls1.shape[2] //2 - 15: logit_cls1.shape[2] //2 + 15]
    GTimg           = GTimg[GTimg.shape[0] //2 - 30: GTimg.shape[0] //2 + 30,
                            GTimg.shape[1] //2 - 30: GTimg.shape[1] //2 + 30,
                            GTimg.shape[2] //2 - 15: GTimg.shape[2] //2 + 15]
    logit_cls = np.stack((logit_cls0, logit_cls1))  # ``(d, H, W, D)``
    logits.append(logit_cls)
    gt.append(GTimg)

# logits is a list of length ``n``,  each element has ``(d, H, W, D)``.
# gt is a list of length ``n``,  each element has ``(H, W, D)``.
# H, W and D could differ for different cases.

Datafile_test = "data_moval/Prostateresults/seg-testA.txt"
Imglist_test = open(Datafile_test)
Imglist_test_read = Imglist_test.read().splitlines()

logits_test = []
gt_test = []
for Imgname_test in Imglist_test_read:
    GT_file = Imgname_test.replace("data", "data_moval")

    caseID = Imgname_test.split("/")[-1][:6]

    logit_cls0_file = "data_moval/Prostateresults/prostattestcondition_A/results/pred_" + caseID + "cls0_prob.nii.gz"
    logit_cls1_file = "data_moval/Prostateresults/prostattestcondition_A/results/pred_" + caseID + "cls1_prob.nii.gz"

    logit_cls0_read = nib.load(logit_cls0_file)
    logit_cls1_read = nib.load(logit_cls1_file)
    logit_cls0      = logit_cls0_read.get_fdata()
    logit_cls1      = logit_cls1_read.get_fdata()
    GT_read         = nib.load(GT_file)
    GTimg           = GT_read.get_fdata()           # ``(H', W', D')``

    logit_cls = np.stack((logit_cls0, logit_cls1))  # ``(n', H', W', D')``

    logits_test.append(logit_cls)
    gt_test.append(GTimg)


# logits_test is a list of length ``n'``,  each element has ``(d, H', W', D')``.
# gt_test is a list of length ``n'``,  each element has ``(H', W', D')``.
# H, W and D could differ for different cases.

[7]:

print(f"The validation predictions, ``logits`` are a list of length {len(logits)} each element has approximately {logits[0].shape}")
print(f"The validation labels, ``gt`` are a list of length {len(gt)}, each element has approximately {gt[0].shape}\n")
print(f"The test predictions, ``logits_test`` are a list of length {len(logits_test)} each element has approximately {logits_test[0].shape}")
print(f"The test labels, ``gt_test`` are a list of length {len(gt_test)}, each element has approximately {gt_test[0].shape}")

The validation predictions, ``logits`` are a list of length 10 each element has approximately (2, 60, 60, 30)
The validation labels, ``gt`` are a list of length 10, each element has approximately (60, 60, 30)

The test predictions, ``logits_test`` are a list of length 2 each element has approximately (2, 256, 256, 40)
The test labels, ``gt_test`` are a list of length 2, each element has approximately (256, 256, 40)

MOVAL estimation

[8]:

moval_options = list(itertools.product(moval.models.get_estim_options(),
                               ["segmentation"],
                               moval.models.get_conf_options(),
                               [False, True]))

[9]:

# ac-model does not need class-speicfic variants
for moval_option in moval_options:
    if moval_option[0] == 'ac-model' and moval_option[-1] == True:
        moval_options.remove(moval_option)

[10]:

print(f"The number of moval options is {len(moval_options)}")

The number of moval options is 36

[11]:

def test_cls(estim_algorithm, mode, confidence_scores, class_specific, logits, gt, logits_test, gt_test):
    """Test MOVAL with different conditions for segmentation tasks

    Args:
        mode (str): The given task to estimate model performance.
        confidence_scores (str):
            The method to calculate the confidence scores. We provide a list of confidence score calculation methods which
            can be displayed by running :py:func:`moval.models.get_conf_options`.
        estim_algorithm (str):
            The algorithm to estimate model performance. We also provide a list of estimation algorithm which can be displayed by
            running :py:func:`moval.models.get_estim_options`.
        class_specific (bool):
            If ``True``, the calculation will match class-wise confidence to class-wise accuracy.
        logits: The network output (logits) of a list of n ``(d, H, W, (D))`` for segmentation.
        gt: The cooresponding annotation of a list of n ``(H, W, (D))`` for segmentation.
        logits_test:  The network testing output (logits) of a list of n' ``(d, H', W', (D'))`` for segmentation.
        gt_test: The cooresponding testing annotation of a list of n' ``(H', W', (D'))`` for segmentation.

    Returns:
        err_test (float): testing error.
        moval_model: Optimized moval model.

    """

    moval_model = moval.MOVAL(
                mode = mode,
                metric = "f1score",
                confidence_scores = confidence_scores,
                estim_algorithm = estim_algorithm,
                class_specific = class_specific
                )

    #
    moval_model.fit(logits, gt)

    # save the test err in the result files.

    estim_dsc_test = moval_model.estimate(logits_test)

    DSC_list_test = []
    for n_case in range(len(logits_test)):
        pred_case   = np.argmax(logits_test[n_case], axis = 0) # ``(H', W', (D'))``
        gt_case     = gt_test[n_case] # ``(H', W', (D'))``

        DSC, _, _ = ComputMetric(pred_case == 1, gt_case == 1)
        DSC_list_test.append(DSC)
    m_DSC_test = np.mean(np.array(DSC_list_test))

    err_test = np.abs( m_DSC_test - estim_dsc_test[1:] )

    return err_test, moval_model

[12]:

err_test_list = []
moval_parameters = []
moval_parameters_ = []

[13]:

for k_cond in tqdm(range(len(moval_options))):

    err_test, moval_model = test_cls(
        estim_algorithm = moval_options[k_cond][0],
        mode = moval_options[k_cond][1],
        confidence_scores = moval_options[k_cond][2],
        class_specific = moval_options[k_cond][3],
        logits = logits,
        gt = gt,
        logits_test = logits_test,
        gt_test = gt_test
    )
    err_test_list.append(err_test)
    moval_parameters.append(moval_model.model_.param)
    if moval_model.model_.extend_param:
        moval_parameters_.append(moval_model.model_.param_ext)
    else:
        moval_parameters_.append(0.)

  0%|                                                                                                                                                                                | 0/36 [00:00<?, ?it/s]

Starting optimizing for model ac-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Calculating and saving the fitted case-wise performance...

  3%|████▋                                                                                                                                                                   | 1/36 [00:01<01:03,  1.81s/it]

Starting optimizing for model ac-model with confidence energy-conf based on metric f1score, class specific is False.
Calculating and saving the fitted case-wise performance...

  6%|█████████▎                                                                                                                                                              | 2/36 [00:03<00:51,  1.52s/it]

Starting optimizing for model ac-model with confidence entropy-conf based on metric f1score, class specific is False.
Calculating and saving the fitted case-wise performance...

  8%|██████████████                                                                                                                                                          | 3/36 [00:04<00:51,  1.55s/it]

Starting optimizing for model ac-model with confidence doctor-conf based on metric f1score, class specific is False.
Calculating and saving the fitted case-wise performance...

 11%|██████████████████▋                                                                                                                                                     | 4/36 [00:06<00:50,  1.59s/it]

Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 14%|███████████████████████▎                                                                                                                                                | 5/36 [00:15<02:09,  4.17s/it]

Starting optimizing for model ts-model with confidence max_class_probability-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 17%|████████████████████████████                                                                                                                                            | 6/36 [00:28<03:43,  7.44s/it]

Starting optimizing for model ts-model with confidence energy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 19%|████████████████████████████████▋                                                                                                                                       | 7/36 [00:36<03:35,  7.41s/it]

Starting optimizing for model ts-model with confidence energy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 22%|█████████████████████████████████████▎                                                                                                                                  | 8/36 [00:50<04:30,  9.66s/it]

Starting optimizing for model ts-model with confidence entropy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 25%|██████████████████████████████████████████                                                                                                                              | 9/36 [00:58<04:05,  9.09s/it]

Starting optimizing for model ts-model with confidence entropy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 28%|██████████████████████████████████████████████▍                                                                                                                        | 10/36 [01:09<04:09,  9.58s/it]

Starting optimizing for model ts-model with confidence doctor-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 31%|███████████████████████████████████████████████████                                                                                                                    | 11/36 [01:16<03:39,  8.77s/it]

Starting optimizing for model ts-model with confidence doctor-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 33%|███████████████████████████████████████████████████████▋                                                                                                               | 12/36 [01:28<03:54,  9.79s/it]

Starting optimizing for model doc-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 36%|████████████████████████████████████████████████████████████▎                                                                                                          | 13/36 [01:35<03:26,  8.97s/it]

Starting optimizing for model doc-model with confidence max_class_probability-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 39%|████████████████████████████████████████████████████████████████▉                                                                                                      | 14/36 [01:46<03:33,  9.69s/it]

Starting optimizing for model doc-model with confidence energy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 42%|█████████████████████████████████████████████████████████████████████▌                                                                                                 | 15/36 [01:52<02:58,  8.51s/it]

Starting optimizing for model doc-model with confidence energy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 44%|██████████████████████████████████████████████████████████████████████████▏                                                                                            | 16/36 [02:01<02:55,  8.80s/it]

Starting optimizing for model doc-model with confidence entropy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 47%|██████████████████████████████████████████████████████████████████████████████▊                                                                                        | 17/36 [02:09<02:38,  8.33s/it]

Starting optimizing for model doc-model with confidence entropy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 50%|███████████████████████████████████████████████████████████████████████████████████▌                                                                                   | 18/36 [02:19<02:40,  8.93s/it]

Starting optimizing for model doc-model with confidence doctor-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 53%|████████████████████████████████████████████████████████████████████████████████████████▏                                                                              | 19/36 [02:25<02:18,  8.17s/it]

Starting optimizing for model doc-model with confidence doctor-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 56%|████████████████████████████████████████████████████████████████████████████████████████████▊                                                                          | 20/36 [02:36<02:20,  8.78s/it]

Starting optimizing for model atc-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 58%|█████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                     | 21/36 [02:44<02:08,  8.53s/it]

Starting optimizing for model atc-model with confidence max_class_probability-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 61%|██████████████████████████████████████████████████████████████████████████████████████████████████████                                                                 | 22/36 [02:55<02:11,  9.38s/it]

Starting optimizing for model atc-model with confidence energy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Not satisfied with initial optimization results of param, trying more initial states...
Tried 1/2 times.
Tried 2/2 times.
Starting from [array([0.03003891]), array([0.05651496])]
Optimization results are [(0.9471907407407407, array([0.95])), (9.259259259319919e-07, array([0.01233018])), (9.259259259319919e-07, array([0.01233022]))]
Calculating and saving the fitted case-wise performance...

 64%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                            | 23/36 [03:07<02:13, 10.28s/it]

Starting optimizing for model atc-model with confidence energy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Not satisfied with initial optimization results of param for class 1, trying more initial states...
Tried 1/2 times.
Tried 2/2 times.
Starting from [array([0.02437192]), array([0.03410565])]
Optimization results are [(0.9304117706049815, 1.0), (7.563572235191884e-08, 0.021166040334292445), (7.563572235191884e-08, 0.021166047484866432)]
Calculating and saving the fitted case-wise performance...

 67%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                       | 24/36 [03:32<02:53, 14.46s/it]

Starting optimizing for model atc-model with confidence entropy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 69%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                   | 25/36 [03:41<02:21, 12.85s/it]

Starting optimizing for model atc-model with confidence entropy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                              | 26/36 [03:53<02:08, 12.83s/it]

Starting optimizing for model atc-model with confidence doctor-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                         | 27/36 [04:01<01:41, 11.31s/it]

Starting optimizing for model atc-model with confidence doctor-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 78%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                     | 28/36 [04:13<01:32, 11.50s/it]

Starting optimizing for model ts-atc-model with confidence max_class_probability-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 81%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                | 29/36 [04:28<01:27, 12.53s/it]

Starting optimizing for model ts-atc-model with confidence max_class_probability-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 83%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                           | 30/36 [04:53<01:37, 16.31s/it]

Starting optimizing for model ts-atc-model with confidence energy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Not satisfied with initial optimization results of param_ext, trying more initial states...
Tried 1/2 times.
Tried 2/2 times.
Starting from [array([0.94486934]), array([0.94580229])]
Optimization results are [(0.052807407407407414, array([0.9])), (0.052807407407407414, array([0.89762587])), (0.052807407407407414, array([0.89851218]))]
Calculating and saving the fitted case-wise performance...

 86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                       | 31/36 [05:14<01:28, 17.71s/it]

Starting optimizing for model ts-atc-model with confidence energy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Not satisfied with initial optimization results of param_ext for class 1, trying more initial states...
Tried 1/2 times.
Tried 2/2 times.
Starting from [array([0.86933913]), array([0.86974927])]
Optimization results are [(0.9304117706049815, 1.0), (1.5898101413114318e-07, 0.8692173531786218), (0.069588229123139, 0.8262618025285865)]
Calculating and saving the fitted case-wise performance...

 89%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                  | 32/36 [06:00<01:44, 26.17s/it]

Starting optimizing for model ts-atc-model with confidence entropy-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████              | 33/36 [06:15<01:08, 22.86s/it]

Starting optimizing for model ts-atc-model with confidence entropy-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋         | 34/36 [06:38<00:45, 22.89s/it]

Starting optimizing for model ts-atc-model with confidence doctor-conf based on metric f1score, class specific is False.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎    | 35/36 [06:52<00:20, 20.02s/it]

Starting optimizing for model ts-atc-model with confidence doctor-conf based on metric f1score, class specific is True.
Opitimizing with 10 samples...
Be patient, it should take a while...
Calculating and saving the fitted case-wise performance...

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [07:15<00:00, 12.11s/it]

Compare estimation results

[14]:

estim = []
conf = []
err = []
err_mean = []
novel = []
k_option = 0
for moval_option in moval_options:
    for k_cond in range(len(err_test_list[k_option])):
        #
        if moval_option[3] == True:
            estim_cs = 'CS '
        else:
            estim_cs = ''
        #
        if moval_option[0] == 'ac-model':
            estim.append(estim_cs + 'AC')
        elif moval_option[0] == 'ts-model':
            estim.append(estim_cs + 'TS')
        elif moval_option[0] == 'doc-model':
            estim.append(estim_cs + 'DoC')
        elif moval_option[0] == 'atc-model':
            estim.append(estim_cs + 'ATC')
        else:
            estim.append(estim_cs + 'TS-ATC')
        #
        if moval_option[2] == 'max_class_probability-conf':
            conf.append('MCP')
        elif moval_option[2] == 'energy-conf':
            conf.append('Energy')
        elif moval_option[2] == 'entropy-conf':
            conf.append('Entropy')
        else:
            conf.append('Doctor')
        #
        if moval_option[2] == 'max_class_probability-conf' and moval_option[3] == False:
            novel.append('Existing Methods')
        else:
            novel.append('Provided by MOVAL')
        #
        err.append(err_test_list[k_option][k_cond])
        err_mean.append(np.mean(err_test_list[k_option]))
    k_option += 1

[15]:

d = {'Estimation Algorithm': estim, 'Confidence Score': conf, 'MAE': err_mean, 'MAE ': err, 'Category': novel}
df = pd.DataFrame(data=d)
#
custom_order = ['AC', 'TS', 'DoC', 'ATC', 'TS-ATC', 'CS TS', 'CS DoC', 'CS ATC', 'CS TS-ATC']
df['Estimation Algorithm'] = pd.Categorical(df['Estimation Algorithm'], categories=custom_order, ordered=True)
df = df.sort_values(by='Estimation Algorithm')
#
custom_order = ['MCP', 'Doctor', 'Entropy', 'Energy']
df['Confidence Score'] = pd.Categorical(df['Confidence Score'], categories=custom_order, ordered=True)
df = df.sort_values(by='Confidence Score')

[16]:

df.head()

[16]:

	Estimation Algorithm	Confidence Score	MAE	MAE	Category
0	AC	MCP	0.169580	0.169580	Existing Methods
29	CS TS-ATC	MCP	0.003049	0.003049	Provided by MOVAL
21	CS ATC	MCP	0.003049	0.003049	Provided by MOVAL
4	TS	MCP	0.126593	0.126593	Existing Methods
13	CS DoC	MCP	0.110844	0.110844	Provided by MOVAL

[17]:

sns.set(rc={'figure.figsize':(6,3)})
sns.set_style("darkgrid")
category_palette = {'Existing Methods': 'grey', 'Provided by MOVAL': '#1f77b4'}
ax = sns.scatterplot(
    data=df, x="Estimation Algorithm", y="Confidence Score", hue="Category", size="MAE",
    sizes=(40, 1000), palette=category_palette
)
ax.set(ylim=(3.5, -0.5))
ax.tick_params(axis='x', rotation=15)
#
# Get the handles and labels from the legend
handles, labels = ax.get_legend_handles_labels()

# Create a custom legend with only desired categories
desired_labels = ['Category', 'Existing Methods', 'Provided by MOVAL', 'MAE', '0.15', '0.30']
desired_handles = [h for h, l in zip(handles, labels) if l in desired_labels]

legend = plt.legend(handles=desired_handles, labels=desired_labels, bbox_to_anchor=(1.2, 1), labelspacing=1)

[18]:

from statannotations.Annotator import Annotator
sns.set(rc={'figure.figsize':(6,2)})
sns.set_style("white")
ax = sns.barplot(df, x="Estimation Algorithm", y="MAE", color = '#1f77b4')
ax.tick_params(axis='x', rotation=15)
#
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
#
pairs=[("TS", "CS TS"), ("DoC", "CS DoC"), ("ATC", "CS ATC"), ("TS-ATC", "CS TS-ATC")]

annotator = Annotator(ax, pairs, data=df, x="Estimation Algorithm", y="MAE")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

TS vs. CS TS: Mann-Whitney-Wilcoxon test two-sided, P_val:4.857e-01 U_stat=1.100e+01
DoC vs. CS DoC: Mann-Whitney-Wilcoxon test two-sided, P_val:3.429e-01 U_stat=1.200e+01
ATC vs. CS ATC: Mann-Whitney-Wilcoxon test two-sided, P_val:2.558e-02 U_stat=1.600e+01
TS-ATC vs. CS TS-ATC: Mann-Whitney-Wilcoxon test two-sided, P_val:2.558e-02 U_stat=1.600e+01

[18]:

(<Axes: xlabel='Estimation Algorithm', ylabel='MAE'>,
 [<statannotations.Annotation.Annotation at 0x7feda8e251c0>,
  <statannotations.Annotation.Annotation at 0x7feda88c76d0>,
  <statannotations.Annotation.Annotation at 0x7feda88c7700>,
  <statannotations.Annotation.Annotation at 0x7feda88c7880>])

[19]:

sns.set(rc={'figure.figsize':(3,2)})
sns.set_style("white")
ax = sns.barplot(df, x="Confidence Score", y="MAE", color = '#1f77b4')
ax.tick_params(axis='x', rotation=15)
#
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
#
pairs=[("MCP", "Doctor"), ("MCP", "Entropy"), ("MCP", "Energy")]

annotator = Annotator(ax, pairs, data=df, x="Confidence Score", y="MAE")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

MCP vs. Doctor: Mann-Whitney-Wilcoxon test two-sided, P_val:1.826e-01 U_stat=5.600e+01
MCP vs. Entropy: Mann-Whitney-Wilcoxon test two-sided, P_val:9.103e-02 U_stat=6.000e+01
MCP vs. Energy: Mann-Whitney-Wilcoxon test two-sided, P_val:2.159e-01 U_stat=2.600e+01

[19]:

(<Axes: xlabel='Confidence Score', ylabel='MAE'>,
 [<statannotations.Annotation.Annotation at 0x7feda88bdac0>,
  <statannotations.Annotation.Annotation at 0x7feda8e15d00>,
  <statannotations.Annotation.Annotation at 0x7feda8e15cd0>])

[20]:

sns.set(rc={'figure.figsize':(12,3)})
category_palette = {'MCP': '#e5f0f8',
                    'Doctor': '#99c6e4',
                    'Entropy': '#4c9cd0',
                    'Energy': '#0072bd'
                   }
ax = sns.barplot(df, x="Estimation Algorithm", y="MAE ", hue="Confidence Score", palette=category_palette, edgecolor="0")
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))


pairs=[
    [("ATC", "MCP"), ("CS TS-ATC", "MCP")]
]

annotator = Annotator(ax, pairs, data=df, x="Estimation Algorithm", y="MAE", hue="Confidence Score")
annotator.configure(test='Mann-Whitney', text_format='star', loc='inside')
annotator.apply_and_annotate()

ax.set(ylim=(-0.02, 0.3))

p-value annotation legend:
      ns: 5.00e-02 < p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

ATC_MCP vs. CS TS-ATC_MCP: Mann-Whitney-Wilcoxon test two-sided, P_val:1.000e+00 U_stat=1.000e+00

[20]:

[(-0.02, 0.3)]

[ ]:

[ ]: