Histopathology

Tutorial histopathology: This tutorial provides a comprehensive, step-by-step guide to using the bioMONAI platform for nuclei segmentation in histopathology images.

Setup imports

from bioMONAI.data import *
from bioMONAI.transforms import *
from bioMONAI.core import *
from bioMONAI.core import Path
from bioMONAI.data import get_images, get_target, RandomSplitter
from bioMONAI.losses import *
from bioMONAI.losses import SSIMLoss
from bioMONAI.metrics import *
from bioMONAI.datasets import download_file
import warnings
warnings.filterwarnings("ignore")
device = get_device()
print(device)
cuda

Download Data

In the next cell, we will download the dataset required for this tutorial. The dataset is hosted online, and we will use the download_file function from the bioMONAI library to download and extract the files.

  • You can change the output_directory variable to specify a different directory where you want to save the downloaded files.
  • The url variable contains the link to the dataset. If you have a different dataset, you can replace this URL with the link to your dataset.
  • By default, we are downloading only the first two images. You can modify the code to download more images if needed.

Make sure you have enough storage space in the specified directory before downloading the dataset.

# Specify the directory where you want to save the downloaded files
output_directory = "../_data/TNBC_NucleiSegmentation"
# Define the base URL for the dataset
url = 'https://zenodo.org/records/2579118/files/TNBC_NucleiSegmentation.zip'

# Download only the first two images
download_file(url, output_directory, extract=True, extract_dir='.', hash='014bc1e08d6459be5508620ad219063a45179a1767b7caf84d64245d7f6cc5a3')
The file has been downloaded and saved to: /home/bm/Documents/bioMONAI/nbs/_data/TNBC_NucleiSegmentation

Prepare Data for Training

In the next cell, we will prepare the data for training. We will specify the path to the training images and define the batch size and patch size. Additionally, we will apply several transformations to the images to augment the dataset and improve the model’s robustness.

  • X_path: The path to the directory containing the low-resolution training images.
  • bs: The batch size, which determines the number of images processed together in one iteration.
  • patch_size: The size of the patches to be extracted from the images.
  • itemTfms: A list of item-level transformations applied to each image, including random cropping, rotation, and flipping.
  • batchTfms: A list of batch-level transformations applied to each batch of images, including intensity scaling.
  • get_target_fn: A function to get the corresponding ground truth images for the low-resolution images.

You can customize the following parameters to suit your needs: - Change the X_path variable to point to a different dataset. - Adjust the bs and patch_size variables to match your hardware capabilities and model requirements. - Modify the transformations in itemTfms and batchTfms to include other augmentations or preprocessing steps.

After defining these parameters and transformations, we will create a BioDataLoaders object to load the training and validation datasets.

X_path = Path(output_directory)/'TNBC_dataset'
img_paths = get_images(X_path,'Slide*')
# create a function to get the target path from the image path
get_target_fn = get_target('GT', same_filename=True, relative_path=True, map_foldername=True, target_folder_prefix="GT", signal_folder_prefix="Slide")

print('input:', img_paths[4], '\ntarget:', get_target_fn(img_paths[4]))
input: ../_data/TNBC_NucleiSegmentation/TNBC_dataset/Slide_01/01_2.png 
target: ../_data/TNBC_NucleiSegmentation/TNBC_dataset/GT_01/01_2.png
gt_paths = [get_target_fn(img_paths[i]) for i in range(len(img_paths))]
patch_size = (1, 128, 128)
overlap = 0.75
save_patches_grid(img_paths, gt_paths, output_directory, patch_size, overlap, 
                  tfms_before=[ TargetedTransform(CropND(slices=[(0,3)], dims=[0]),targets=('X',)),     # Ensures that there are only 3 channels
                                TargetedTransform(RGB2HED(),targets=('X',)),                            # Changes color space
                                TargetedTransform(CropND(slices=[(0,1)], dims=[0]),targets=('X',)),     # Takes only channel H
                                TargetedTransform(ScaleImagePercentiles(),targets=('X',)),          # Scales images
                                TargetedTransform(ScaleIntensity(),targets=('y',)),          # Scales masks
                  ],
)
Processing files: 100%|██████████| 50/50 [00:04<00:00, 10.71it/s]
Train set saved to '../_data/TNBC_NucleiSegmentation/patches_train.csv'.
Test set saved to '../_data/TNBC_NucleiSegmentation/patches_test.csv'.
'is_valid' column added to '../_data/TNBC_NucleiSegmentation/patches_train.csv' for validation samples.
n_channels = 1
data_ops = {
    'fn_col': ['path_signal'],
    'target_col': ['path_target'],
    'valid_col': ['is_valid'],
    'seed': 42, 
    'bs': 16,
    'img_cls': BioImage,
    'target_img_cls': BioImage,   
    'item_tfms': [RandRot90(prob=.75), 
                RandFlip(prob=0.75)],
    'batch_tfms': [],   # batch transformations 
}

data = BioDataLoaders.from_csv(
    '',
    output_directory + '/patches_train.csv',
    show_summary=False,
    **data_ops,
    )

# print length of training and validation datasets
print('train images:', len(data.train_ds.items), '\nvalidation images:', len(data.valid_ds.items))
train images: 5914 
validation images: 845
a,b = data.one_batch()
print('image shape:', a.shape, '\ntarget shape:', b.shape)
image shape: torch.Size([16, 1, 128, 128]) 
target shape: torch.Size([16, 1, 128, 128])
print(b.unique())
metatensor([0., 1.], device='cuda:0')
from bioMONAI.visualize import plot_intensity_histogram

plot_intensity_histogram(a[0].squeeze().cpu(), bins=50)

Visualize a Batch of Training Data

In the next cell, we will visualize a batch of training data to get an idea of what the images look like after applying the transformations. This step is crucial to ensure that the data augmentation and preprocessing steps are working as expected.

  • data.show_batch(cmap='magma'): This function will display a batch of images from the training dataset using the ‘magma’ colormap.

Change the cmap parameter to use a different colormap (e.g., ‘gray’, ‘viridis’, ‘plasma’) based on your preference.

Visualizing the data helps in understanding the dataset better and ensures that the transformations are applied correctly.

data.show_batch(cmap='magma')

Define and Train the Model

# from monai.networks.nets import UNETR
from bioMONAI.nets import create_unet_model, resnet34
from fastai.vision.all import xresnet50

# model = UNETR(in_channels=n_channels, out_channels=1, img_size=patch_size[1:], feature_size=32, norm_name='batch', spatial_dims=2)

model = create_unet_model(resnet34, 1, patch_size[1:], True, n_in=n_channels, cut=None, blur_final=True, self_attention=False)
from fastai.vision.all import BCEWithLogitsLossFlat
loss = BCEWithLogitsLossFlat()

metrics = [DiceMetric(include_background=False)]

trainer = fastTrainer(data, model, loss_fn=loss, metrics=metrics, show_summary=False)
# trainer.fit_one_cycle(50, 1e-3)
trainer.fine_tune(50, freeze_epochs=2)
epoch train_loss valid_loss Dice time
0 0.113974 0.105411 0.805709 00:18
1 0.102899 0.096048 0.797650 00:18

epoch train_loss valid_loss Dice time
0 0.070051 0.069395 0.851596 00:18
1 0.060884 0.060859 0.871055 00:18
2 0.052648 0.054197 0.879664 00:18
3 0.047828 0.051695 0.880583 00:18
4 0.041352 0.044954 0.900431 00:18
5 0.039364 0.043834 0.906231 00:18
6 0.036252 0.039149 0.910417 00:18
7 0.034468 0.039960 0.912614 00:18
8 0.033840 0.039754 0.909334 00:18
9 0.033057 0.035105 0.919484 00:18
10 0.029403 0.035377 0.917679 00:18
11 0.031835 0.036668 0.919616 00:18
12 0.030111 0.034590 0.921471 00:18
13 0.029058 0.032257 0.927056 00:18
14 0.025163 0.029925 0.930045 00:18
15 0.028124 0.029094 0.931588 00:18
16 0.021818 0.026987 0.939882 00:18
17 0.020511 0.026252 0.942420 00:18
18 0.018345 0.023014 0.947853 00:19
19 0.016837 0.024367 0.945535 00:19
20 0.018045 0.020225 0.950313 00:18
21 0.014044 0.017826 0.959083 00:18
22 0.012973 0.015666 0.961283 00:18
23 0.013525 0.017656 0.961090 00:18
24 0.010478 0.015660 0.967928 00:18
25 0.010090 0.014393 0.965216 00:18
26 0.008829 0.013923 0.965532 00:18
27 0.007769 0.012869 0.970934 00:18
28 0.007881 0.013659 0.967699 00:18
29 0.006484 0.011805 0.974513 00:18
30 0.005677 0.010727 0.978264 00:18
31 0.004760 0.009598 0.979435 00:18
32 0.004146 0.009916 0.979894 00:18
33 0.003761 0.009179 0.980314 00:18
34 0.003409 0.009136 0.980893 00:18
35 0.002844 0.009236 0.981670 00:18
36 0.002660 0.008594 0.983469 00:18
37 0.002274 0.008587 0.985615 00:19
38 0.002038 0.008165 0.986075 00:20
39 0.001869 0.007914 0.985183 00:18
40 0.001471 0.008690 0.986295 00:18
41 0.001459 0.008674 0.986782 00:18
42 0.001330 0.008350 0.986351 00:18
43 0.001206 0.008831 0.985990 00:18
44 0.001135 0.008782 0.986378 00:18
45 0.001079 0.008584 0.987193 00:18
46 0.000989 0.009088 0.986672 00:18
47 0.001004 0.008847 0.986917 00:18
48 0.000991 0.009367 0.986972 00:18
49 0.000979 0.009399 0.987093 00:18

Show Results

In the next cell, we will visualize the results of the trained model on a batch of validation data. This step helps in understanding how well the model has learned to denoise the images.

  • trainer.show_results(cmap='magma'): This function will display a batch of images from the validation dataset along with their corresponding denoised outputs using the ‘magma’ colormap.

Visualizing the results helps in assessing the performance of the model and identifying any areas that may need further improvement.

trainer.show_results(cmap='magma')

Save the Trained Model

In the next cell, we will save the trained model to a file. This step is crucial to preserve the model’s weights and architecture, allowing you to load and use the model later without retraining it.

  • trainer.save('tmp-model'): This function saves the model to a file named ‘tmp-model’. You can change the filename to something more descriptive based on your project.

Suggestions for customization: - Change the filename to include details like the model architecture, dataset, or date (e.g., ‘unet_resnet34_U2OS_2023’). - Save the model in a specific directory by providing the full path (e.g., ‘models/unet_resnet34_U2OS_2023’). - Save additional information like training history, metrics, or configuration settings in a separate file for better reproducibility.

Saving the model ensures that you can easily share it with others or deploy it in a production environment without needing to retrain it.

trainer.save('905-single-channel-model')
Path('models/905-single-channel-model.pth')

Evaluate the Model on Test Data

In the next cell, we will evaluate the performance of the trained model on unseen test data. This step is crucial to get an unbiased evaluation of the model’s performance and understand how well it generalizes to new data.

  • test_X_path: The path to the directory containing the low-resolution test images.
  • test_data: A DataLoader object created from the test images.
  • evaluate_model(trainer, test_data, metrics=SSIMMetric(2)): This function evaluates the model on the test dataset using the specified metrics (in this case, SSIM).

Suggestions for customization: - Change the test_X_path variable to point to a different test dataset. - Add more metrics to the metrics parameter to get a comprehensive evaluation (e.g., MSEMetric(), MAEMetric()). - Save the evaluation results to a file for further analysis or reporting.

Evaluating the model on test data helps in understanding its performance in real-world scenarios and identifying any areas that may need further improvement.

# import pandas as pd
# test_csv_path = output_directory + '/patches_test.csv'
# df = pd.read_csv(test_csv_path, header='infer', delimiter=None, quoting=0)
# test_data = data.test_dl(df, with_labels=True)

test_data = test_biodataloader(data, output_directory + '/patches_test.csv')
# print length of test dataset
print('test images:', len(test_data.items))

evaluate_model(trainer, test_data, metrics=SSIMMetric(2));
test images: 1691

Value
BCEWithLogitsLossFlat
Mean 0.646455
Median 0.657559
Standard Deviation 0.042168
Min 0.502513
Max 0.694533
Q1 0.616751
Q3 0.682825

Value
SSIM
Mean 0.994384
Median 0.998468
Standard Deviation 0.012012
Min 0.839270
Max 1.000000
Q1 0.994292
Q3 0.999855