How Magnification Affects Virchow (ViT)

January 23, 2026•1,385 words

Reviewing the Architecture -> "Zoom" matters

We examined the architecture of leading foundation models, specifically Virchow, alongside NatImg, PLIP, CTransPath, and Phikon. A critical constraint emerged regarding image resolution because these models were predominantly trained on 20× patches (approximately 0.5 µm/pixel).

This establishes a strict preference for native resolution. While humans adapt easily between magnifications, these models are highly sensitive to pixel density. Feeding them inputs that are too coarse (5×) or overly granular (40×) without adjustment risks a significant degradation in accuracy.

Analysing the Performance Data

To quantify this risk, they reviewed performance metrics across multiple datasets and magnification strategies. They specifically tracked Accuracy, Balanced Accuracy, and Weighted F1 scores to observe how the models handled deviations from their training resolution.

(Note: Bold rows indicate Virchow, the current focus model.)

Source: Virchow 1 | Paper

(Note: This is an original table with the data from the Virchow paper to make it easy for our usecase)

Click here to view the Full Performance Table (Detailed)

Dataset	Tissue	Mag	Magnification Strategy	Image Size	Model	Acc	Bal Acc	Wt F1
PanMSK	17 cancer types	20×	0.5 mpp native resolution	224 × 224	NatImg	0.883	0.883	0.883
					PLIP	0.862	0.862	0.862
					CTransPath	0.897	0.897	0.897
					DINOp=8	0.903	0.903	0.903
					Phikon	0.924	0.924	0.923
					Virchow	0.950	0.950	0.950
CRC	Colon	20×	Native 20× magnification	224 × 224	NatImg	0.952	0.926	0.952
					PLIP	0.946	0.918	0.944
					CTransPath	0.962	0.947	0.962
					DINOp=8	0.959	0.945	0.959
					Phikon	0.958	0.944	0.959
					Virchow	0.973	0.962	0.973
CRC (no norm)	Colon	20×	Native 20× (no stain normalization)	224 × 224	NatImg	0.927	0.894	0.927
					PLIP	0.794	0.742	0.806
					CTransPath	0.840	0.825	0.844
					DINOp=8	0.949	0.919	0.949
					Phikon	0.883	0.872	0.888
					Virchow	0.968	0.960	0.968
WILDS	Lymph node	10×	Downsampled from 40×	96 × 96	NatImg	0.934	0.934	0.934
					PLIP	0.869	0.869	0.867
					CTransPath	0.947	0.947	0.947
					DINOp=8	0.957	0.957	0.957
					Phikon	0.971	0.971	0.971
					Virchow	0.970	0.970	0.970
PCam	Lymph node	10×	Downsampled from 40×, then upsampled to 224×224 for Virchow	96 × 96 → 224 × 224	NatImg	0.886	0.886	0.886
					PLIP	0.874	0.874	0.873
					CTransPath	0.872	0.872	0.872
					DINOp=8	0.918	0.918	0.918
					Phikon	0.906	0.906	0.905
					Virchow	0.933	0.933	0.933
MHIST	Colon	5×	Downsampled from 40× to increase field of view	224 × 224	NatImg	0.826	0.821	0.827
					PLIP	0.801	0.786	0.801
					CTransPath	0.817	0.801	0.816
					DINOp=8	0.771	0.746	0.769
					Phikon	0.795	0.782	0.796
					Virchow	0.834	0.830	0.835

Standardization Strategy

Our dataset currently presents a distribution of varying magnifications. To ensure consistency and optimal performance, we will standardize all imagery to 20×.

The chart below is based on a statistically representative approximation of the TCGA dataset structure, not a live query of the 30,000+ files.

[File: 8ea2d9ca-e3f8-49f6-b806-b55bd1ab8150]

We selected 20× as our baseline for three reasons:

Native Compatibility: It aligns with the model’s native resolution where accuracy is highest.
Safe Down-sampling: We can reduce 40× slides using area-averaging (cv2.INTER_AREA) to avoid aliasing artifacts.
Quality Control: It avoids the significant accuracy drop seen at 5× (≈83%) while retaining the robustness found at 10× (~95%).

The Efficient Downsampling Strategy

Consequently, we searched for the most effective approach to handle high magnification slides. The optimal strategy is to downsample 40× images to the native 20× resolution of the model.

However, the specific method matters. Standard bilinear interpolation often introduces artifacts that look like noise. The more robust solution is area-based resampling (INTER_AREA). This method resamples using pixel area relations to preserve tissue texture and ensures the model receives a clean, representative input.
(similar to TIAToolbox)

Implementation

Below is the code for the optimised workflow discussed above.

import cv2
import openslide
import numpy as np

# 1. Open your WSI
slide = openslide.OpenSlide('my_slide.svs')

# 2. Read a region at full resolution (Level 0)
# Note: In a real loop, you'd specify coordinates rather than the whole slide dimensions
region = slide.read_region((0,0), 0, slide.dimensions) 
img = np.array(region)[:, :, :3] # Drop the alpha channel

# 3. The Key Step: Down-sample with INTER_AREA
# Example: Going from 40x to 20x (0.5 scaling factor)
downsampled = cv2.resize(img, (0,0), fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)

# 4. Feed `downsampled` (e.g., 224x224) into your model

How Magnification Affects Virchow (ViT)

Reviewing the Architecture -> "Zoom" matters

Analysing the Performance Data

Standardization Strategy

The Efficient Downsampling Strategy

Implementation

More from Fucking Iceberg
All posts

Hire people who give a shit. - by Alexandr Wang

How Magnification Affects Virchow (ViT)

Reviewing the Architecture -> "Zoom" matters

Analysing the Performance Data

Standardization Strategy

The Efficient Downsampling Strategy

Implementation

More from Fucking IcebergAll posts

Hire people who give a shit. - by Alexandr Wang

More from Fucking Iceberg
All posts