How Magnification Affects Virchow (ViT)

Reviewing the Architecture -> "Zoom" matters

We examined the architecture of leading foundation models, specifically Virchow, alongside NatImg, PLIP, CTransPath, and Phikon. A critical constraint emerged regarding image resolution because these models were predominantly trained on 20× patches (approximately 0.5 µm/pixel).

This establishes a strict preference for native resolution. While humans adapt easily between magnifications, these models are highly sensitive to pixel density. Feeding them inputs that are too coarse (5×) or overly granular (40×) without adjustment risks a significant degradation in accuracy.

Analysing the Performance Data

To quantify this risk, they reviewed performance metrics across multiple datasets and magnification strategies. They specifically tracked Accuracy, Balanced Accuracy, and Weighted F1 scores to observe how the models handled deviations from their training resolution.

(Note: Bold rows indicate Virchow, the current focus model.)

Source: Virchow 1 | Paper

(Note: This is an original table with the data from the Virchow paper to make it easy for our usecase)

Click here to view the Full Performance Table (Detailed)
Dataset Tissue Mag Magnification Strategy Image Size Model Acc Bal Acc Wt F1
PanMSK 17 cancer types 20× 0.5 mpp native resolution 224 × 224 NatImg 0.883 0.883 0.883
PLIP 0.862 0.862 0.862
CTransPath 0.897 0.897 0.897
DINOp=8 0.903 0.903 0.903
Phikon 0.924 0.924 0.923
Virchow 0.950 0.950 0.950
CRC Colon 20× Native 20× magnification 224 × 224 NatImg 0.952 0.926 0.952
PLIP 0.946 0.918 0.944
CTransPath 0.962 0.947 0.962
DINOp=8 0.959 0.945 0.959
Phikon 0.958 0.944 0.959
Virchow 0.973 0.962 0.973
CRC
(no norm)
Colon 20× Native 20× (no stain normalization) 224 × 224 NatImg 0.927 0.894 0.927
PLIP 0.794 0.742 0.806
CTransPath 0.840 0.825 0.844
DINOp=8 0.949 0.919 0.949
Phikon 0.883 0.872 0.888
Virchow 0.968 0.960 0.968
WILDS Lymph node 10× Downsampled from 40× 96 × 96 NatImg 0.934 0.934 0.934
PLIP 0.869 0.869 0.867
CTransPath 0.947 0.947 0.947
DINOp=8 0.957 0.957 0.957
Phikon 0.971 0.971 0.971
Virchow 0.970 0.970 0.970
PCam Lymph node 10× Downsampled from 40×, then upsampled to 224×224 for Virchow 96 × 96 → 224 × 224 NatImg 0.886 0.886 0.886
PLIP 0.874 0.874 0.873
CTransPath 0.872 0.872 0.872
DINOp=8 0.918 0.918 0.918
Phikon 0.906 0.906 0.905
Virchow 0.933 0.933 0.933
MHIST Colon Downsampled from 40× to increase field of view 224 × 224 NatImg 0.826 0.821 0.827
PLIP 0.801 0.786 0.801
CTransPath 0.817 0.801 0.816
DINOp=8 0.771 0.746 0.769
Phikon 0.795 0.782 0.796
Virchow 0.834 0.830 0.835

Standardization Strategy

Our dataset currently presents a distribution of varying magnifications. To ensure consistency and optimal performance, we will standardize all imagery to 20×.

The chart below is based on a statistically representative approximation of the TCGA dataset structure, not a live query of the 30,000+ files.

[File: 8ea2d9ca-e3f8-49f6-b806-b55bd1ab8150]

We selected 20× as our baseline for three reasons:

  • Native Compatibility: It aligns with the model’s native resolution where accuracy is highest.
  • Safe Down-sampling: We can reduce 40× slides using area-averaging (cv2.INTER_AREA) to avoid aliasing artifacts.
  • Quality Control: It avoids the significant accuracy drop seen at 5× (≈83%) while retaining the robustness found at 10× (~95%).

The Efficient Downsampling Strategy

Consequently, we searched for the most effective approach to handle high magnification slides. The optimal strategy is to downsample 40× images to the native 20× resolution of the model.

However, the specific method matters. Standard bilinear interpolation often introduces artifacts that look like noise. The more robust solution is area-based resampling (INTER_AREA). This method resamples using pixel area relations to preserve tissue texture and ensures the model receives a clean, representative input.
(similar to TIAToolbox)

Implementation

Below is the code for the optimised workflow discussed above.

import cv2
import openslide
import numpy as np

# 1. Open your WSI
slide = openslide.OpenSlide('my_slide.svs')

# 2. Read a region at full resolution (Level 0)
# Note: In a real loop, you'd specify coordinates rather than the whole slide dimensions
region = slide.read_region((0,0), 0, slide.dimensions) 
img = np.array(region)[:, :, :3] # Drop the alpha channel

# 3. The Key Step: Down-sample with INTER_AREA
# Example: Going from 40x to 20x (0.5 scaling factor)
downsampled = cv2.resize(img, (0,0), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)

# 4. Feed `downsampled` (e.g., 224x224) into your model


You'll only receive email when they publish something new.

More from Fucking Iceberg
All posts