How Magnification Affects Virchow (ViT)
January 23, 2026•1,385 words
Reviewing the Architecture -> "Zoom" matters
We examined the architecture of leading foundation models, specifically Virchow, alongside NatImg, PLIP, CTransPath, and Phikon. A critical constraint emerged regarding image resolution because these models were predominantly trained on 20× patches (approximately 0.5 µm/pixel).
This establishes a strict preference for native resolution. While humans adapt easily between magnifications, these models are highly sensitive to pixel density. Feeding them inputs that are too coarse (5×) or overly granular (40×) without adjustment risks a significant degradation in accuracy.
Analysing the Performance Data
To quantify this risk, they reviewed performance metrics across multiple datasets and magnification strategies. They specifically tracked Accuracy, Balanced Accuracy, and Weighted F1 scores to observe how the models handled deviations from their training resolution.
(Note: Bold rows indicate Virchow, the current focus model.)
Source: Virchow 1 | Paper
(Note: This is an original table with the data from the Virchow paper to make it easy for our usecase)
Click here to view the Full Performance Table (Detailed)
| Dataset | Tissue | Mag | Magnification Strategy | Image Size | Model | Acc | Bal Acc | Wt F1 |
|---|---|---|---|---|---|---|---|---|
| PanMSK | 17 cancer types | 20× | 0.5 mpp native resolution | 224 × 224 | NatImg | 0.883 | 0.883 | 0.883 |
| PLIP | 0.862 | 0.862 | 0.862 | |||||
| CTransPath | 0.897 | 0.897 | 0.897 | |||||
| DINOp=8 | 0.903 | 0.903 | 0.903 | |||||
| Phikon | 0.924 | 0.924 | 0.923 | |||||
| Virchow | 0.950 | 0.950 | 0.950 | |||||
| CRC | Colon | 20× | Native 20× magnification | 224 × 224 | NatImg | 0.952 | 0.926 | 0.952 |
| PLIP | 0.946 | 0.918 | 0.944 | |||||
| CTransPath | 0.962 | 0.947 | 0.962 | |||||
| DINOp=8 | 0.959 | 0.945 | 0.959 | |||||
| Phikon | 0.958 | 0.944 | 0.959 | |||||
| Virchow | 0.973 | 0.962 | 0.973 | |||||
| CRC (no norm) |
Colon | 20× | Native 20× (no stain normalization) | 224 × 224 | NatImg | 0.927 | 0.894 | 0.927 |
| PLIP | 0.794 | 0.742 | 0.806 | |||||
| CTransPath | 0.840 | 0.825 | 0.844 | |||||
| DINOp=8 | 0.949 | 0.919 | 0.949 | |||||
| Phikon | 0.883 | 0.872 | 0.888 | |||||
| Virchow | 0.968 | 0.960 | 0.968 | |||||
| WILDS | Lymph node | 10× | Downsampled from 40× | 96 × 96 | NatImg | 0.934 | 0.934 | 0.934 |
| PLIP | 0.869 | 0.869 | 0.867 | |||||
| CTransPath | 0.947 | 0.947 | 0.947 | |||||
| DINOp=8 | 0.957 | 0.957 | 0.957 | |||||
| Phikon | 0.971 | 0.971 | 0.971 | |||||
| Virchow | 0.970 | 0.970 | 0.970 | |||||
| PCam | Lymph node | 10× | Downsampled from 40×, then upsampled to 224×224 for Virchow | 96 × 96 → 224 × 224 | NatImg | 0.886 | 0.886 | 0.886 |
| PLIP | 0.874 | 0.874 | 0.873 | |||||
| CTransPath | 0.872 | 0.872 | 0.872 | |||||
| DINOp=8 | 0.918 | 0.918 | 0.918 | |||||
| Phikon | 0.906 | 0.906 | 0.905 | |||||
| Virchow | 0.933 | 0.933 | 0.933 | |||||
| MHIST | Colon | 5× | Downsampled from 40× to increase field of view | 224 × 224 | NatImg | 0.826 | 0.821 | 0.827 |
| PLIP | 0.801 | 0.786 | 0.801 | |||||
| CTransPath | 0.817 | 0.801 | 0.816 | |||||
| DINOp=8 | 0.771 | 0.746 | 0.769 | |||||
| Phikon | 0.795 | 0.782 | 0.796 | |||||
| Virchow | 0.834 | 0.830 | 0.835 |
Standardization Strategy
Our dataset currently presents a distribution of varying magnifications. To ensure consistency and optimal performance, we will standardize all imagery to 20×.
The chart below is based on a statistically representative approximation of the TCGA dataset structure, not a live query of the 30,000+ files.
[File: 8ea2d9ca-e3f8-49f6-b806-b55bd1ab8150]
We selected 20× as our baseline for three reasons:
- Native Compatibility: It aligns with the model’s native resolution where accuracy is highest.
- Safe Down-sampling: We can reduce 40× slides using area-averaging (
cv2.INTER_AREA) to avoid aliasing artifacts. - Quality Control: It avoids the significant accuracy drop seen at 5× (≈83%) while retaining the robustness found at 10× (~95%).
The Efficient Downsampling Strategy
Consequently, we searched for the most effective approach to handle high magnification slides. The optimal strategy is to downsample 40× images to the native 20× resolution of the model.
However, the specific method matters. Standard bilinear interpolation often introduces artifacts that look like noise. The more robust solution is area-based resampling (INTER_AREA). This method resamples using pixel area relations to preserve tissue texture and ensures the model receives a clean, representative input.
(similar to TIAToolbox)
Implementation
Below is the code for the optimised workflow discussed above.
import cv2
import openslide
import numpy as np
# 1. Open your WSI
slide = openslide.OpenSlide('my_slide.svs')
# 2. Read a region at full resolution (Level 0)
# Note: In a real loop, you'd specify coordinates rather than the whole slide dimensions
region = slide.read_region((0,0), 0, slide.dimensions)
img = np.array(region)[:, :, :3] # Drop the alpha channel
# 3. The Key Step: Down-sample with INTER_AREA
# Example: Going from 40x to 20x (0.5 scaling factor)
downsampled = cv2.resize(img, (0,0), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
# 4. Feed `downsampled` (e.g., 224x224) into your model