Digital pathology

Digital pathology is a sub-field of pathology that focuses on managing and analyzing information generated from digitized specimen slides. It utilizes computer-based technology and virtual microscopy to view, manage, share, and analyze digital slides on computer monitors.^[1] This field has applications in diagnostic medicine and aims to achieve more efficient and cost-effective diagnoses, prognoses, and disease predictions through advancements in machine learning and artificial intelligence in healthcare.^[2]^[3]

Remove ads

History

Summarize

Perspective

The roots of digital pathology trace back to the 1960s with early telepathology experiments. The concept of virtual microscopy emerged in the 1990s across various areas of life science research.^[4] At the turn of the century the scientific community more and more agreed on the term "digital pathology" to denote digitization efforts in pathology. However, in 2000, the technical requirements (scanner, storage, network) were still a limiting factor for a broad dissemination of digital pathology concepts. This changed as new powerful and affordable scanner technology as well as mass / cloud storage technologies appeared on the market. The field of radiology has undergone the digital transformation almost 15 years ago, not because radiology is more advanced, but there are fundamental differences between digital images in radiology and digital pathology: The image source in radiology is the (alive) patient, and today in most cases, the image is even primarily captured in digital format. In pathology the scanning is done from preserved and processed specimens, for retrospective studies even from slides stored in a biobank. Besides this difference in pre-analytics and metadata content, the required storage in digital pathology is two to three orders of magnitude higher than in radiology. However, the advantages anticipated through digital pathology are similar to those in radiology:

Capability to transmit digital slides over distances quickly, which enables telepathology scenarios.
Capability to access past specimen from the same patients and similar cases for comparison and review, with much less effort than retrieving slides from the archive shelves.
Capability to compare different areas of multiple slides simultaneously (slide by slide mode) with the help of a virtual microscope.
Capability to annotate areas directly in the slide and share this for teaching and research.

Digital pathology is today widely used for educational purposes^[5] in telepathology and teleconsultation as well as in research projects. Digital pathology allows to share and annotate slides in a much easier way and to download annotated lecture sets generates new opportunities for e-learning and knowledge sharing in pathology. Digital pathology in diagnostics is an emerging and upcoming field.

Remove ads

Environment

Summarize

Perspective

Scan

Digital slides are created from glass slides using specialized scanning machines. All high quality scans must be free of dust, scratches, and other obstructions. There are two common methods for digital slide scanning, tile-based scanning and line-based scanning.^[6] Both technologies use an integrated camera and a motorized stage to move the slide around while parts of the tissue are imaged. Tile scanners capture square field-of-view images covering the entire tissue area on the slide, while line-scanners capture images of the tissue in long, uninterrupted stripes rather than tiles. In both cases, software associated with the scanner stitch the tiles or lines together into a single, seamless image.

Z-stacking is the scanning of a slide at multiple focal planes along the vertical z-axis.^[7]

View

Digital slides are accessible for viewing via a computer monitor and viewing software either locally or remotely via the Internet. An example of an open-source, web-based viewer for this purpose implemented in pure JavaScript, for desktop and mobile, is the OpenSeadragon^[8] viewer. QuPath^[9] is another such open source software, which is often used for digital pathology applications because it offers a powerful set of tools for working with whole slide images. MIKAIA lite^[10] is a free viewer that is frequently used to batch-convert or crop slides, convert annotation formats, create annotations (clip, fuse, subtract, add margins, shrink/grow), or export tiles with masks for creating AI training datasets. OpenSlide,^[11] on the other hand is a C library (Python and Java bindings are also available) that provides a simple interface to read and view whole-slide images.

Manage

Digital slides are maintained in an information management system (IMS)^[12] that allows for archival and intelligent retrieval.

Network

Digital slides are often stored and delivered over the Internet or private networks, for viewing and consultation.

Analyze

Image analysis tools are used to derive objective quantification measures from digital slides. Image segmentation and classification algorithms, often implemented using deep neural networks, are used to identify medically significant regions and objects on digital slides. A GPU acceleration software for pathology imaging analysis, cross-comparing spatial boundaries of a huge amount of segmented micro-anatomic objects has been developed.^[13] The core algorithm of PixelBox in this software has been adopted in Fixstars' Geometric Performance Primitives (GPP) library^[14] as a part of NVIDIA Developer, which is a production geometry engine for advanced graphical information systems, electronic design automation, computer vision and motion planning solutions.^[15]

Ki67 stain calculation by QuPath in a pure seminoma, which gives a measure of the proliferation rate of the tumor. The colors represent the intensity of expression: blue-no expression, yellow-low, orange-moderate, and red-high expression.^[16]
Tissue segmentation for digital calculation of bone marrow cellularity in QuPath: The system is trained on the appearance of immune cells versus other tissue, and uses this to give an overall percentage of each type.
Breast cancer prediction by AI.^[17]

Simplified example of training a neural network in cytologic object detection: The network is trained by multiple images that are known to depict benign cells (upper left) and cancer cells (lower left), which are correlated with "nodes" that represent visual aspects, in this case nuclear size and chromatin pattern. The benign cells match with small nuclei and finely granular chromatin, whereas most cancer cells match with large nuclei and coarsely granular chromatin. However, the instance of a cancer cell with fine chromatin creates a weakly weighted association between them.
Subsequent run of the network on an input image (left): The network correctly detects the benign cell. However, the weakly weighted association between fine chromatin and cancer cells also confers a weak signal to the latter from one of two intermediate nodes. In addition, a blood vessel (bottom left) that was not included in the training partially conforms to the patterns of large nuclei and coarse chromatin, and therefore results in weak signals for the cancer cell output. These weak signals may result in a false positive result for a cancer cell.

Integrate

Digital pathology workflow is integrated into the institution's overall operational environment. Slide digitization is expected to reduce the number of routine, manually reviewed slides, maximizing workload efficiency.

Digital pathology also allows internet information sharing for education, diagnostics, publication and research. This may take the form of publicly available datasets or open source access to machine learning algorithms.

Remove ads

Digital Slide Files

Summarize

Perspective

Digital pathology relies fundamentally on digital slide files-also known as whole-slide images (WSIs)-that encapsulate high-resolution representations of entire microscope slides. These files enable remote diagnosis, computational analysis, education, and archiving at a scale and flexibility impossible with traditional glass slides. The technical design of such file formats has implications for interoperability, performance, long-term data stewardship, and downstream analytical workflows. Broadly, digital slide file formats fall into two major categories: proprietary formats developed by hardware vendors for their scanners, and interoperable formats engineered to facilitate cross-platform compatibility and open data exchange.^[18]

Proprietary Formats

Proprietary digital slide formats are developed by hardware vendors to optimize performance and functionality for their specific scanning systems. These formats typically extend standard image containers with custom metadata structures, compression schemes, and organizational paradigms tailored to each manufacturer's technological approach. Vendor formats are designed to optimize feature sets of their native ecosystems and they present challenges for long-term data preservation, cross-platform compatibility, and vendor-neutral analysis workflows.^[19] These challenges include vendor lock-in scenarios where institutions become dependent on specific hardware and software ecosystems, difficulties in migrating data between different platforms, and increased costs for maintaining multiple proprietary toolchains.^[20]

SVS (Aperio)

The SVS format (Slide and Viewable Storage), developed by Aperio (now part of Leica Biosystems), is one of the most widely used digital slide formats in clinical and research pathology. SVS files are based on the TIFF image standard, extended to support the multi-resolution image pyramids. The format supports multiple image resolutions within a single file, with each level stored as a tiled image. The base (first) image is always the full-resolution capture. Subsidiary images represent downsampled overviews, a thumbnail, and optionally a macro image or a scanned label of the glass slide.^[21]

NDPI (Hamamatsu)

NDPI is Hamamatsu’s proprietary TIFF-based whole-slide imaging format, combining standard multi-directory TIFF pyramids with custom extensions for random access viewing and metadata handling. The format embeds JPEG-compressed strips within TIFF IFDs, uses private tag ranges for offset catalogs and restart markers, and places the macro overview in the final directory—all without separate index files. Multi-resolution TIFF pyramid Separate IFDs represent each zoom level; the lowest-resolution (macro) overview resides in the last directory. JPEG-compressed strips with restart markers Image data is stored as JPEG-compressed strips. Restart markers enable robust, random-access decoding of individual strips. Private TIFF tags (65420–65449+) Hamamatsu reserves custom tags to record strip offsets, high-order offset bits, restart-marker catalogs, and slide-specific metadata such as scan parameters.^[22]^[23]^[24]^[25]

Philips iSyntax

The iSyntax format, developed by Philips for its IntelliSite Pathology Solution and Ultra Fast Scanner systems, is a proprietary whole-slide imaging (WSI) format designed to combine the medical-grade image quality of JPEG 2000 with the speed and responsiveness of JPEG. Unlike traditional pyramid-based TIFF formats, iSyntax uses a wavelet-based, inherently multi-resolution compression scheme. The format is optimized for real-time encoding and decoding, with a simplified entropy coding stage—based on local correlation and an arithmetic coder—that is faster than JPEG 2000’s EBCOT, at the cost of ~10% increase in file size.^[26]

Metadata in iSyntax is stored in a proprietary structure accessible via the Philips Pathology SDK. Due to its closed specification, interoperability with vendor-neutral libraries such as OpenSlide is not natively supported, and third-party access typically requires the official SDK or conversion tools.

Other Proprietary Formats

Additional examples include BIF (BioImagene Image File, Roche), MRXS (3DHistech), SCN (older Leica), VMS/VMU (other Hamamatsu types), and more, most of which follow variants of the TIFF or BigTIFF structural paradigms, add proprietary tags, and embed unique metadata content. The diversity of proprietary formats, lack of public documentation, and evolving vendor SDKs all contribute to challenges in universal accessibility and comprehensive tool compatibility.

Interoperable Formats

In response to the proliferation of proprietary formats and the growing demand for large-scale, multi-center, and AI-driven digital pathology solutions, the community has advanced a set of interoperable image formats engineered for both long-term preservation and high-performance analysis. Chief among these are DICOM for whole-slide imaging, OME-TIFF, and Iris File Extension, each offering key advantages for cross-platform data sharing, metadata standardization, and toolchain integration.^[27]^[28]

Digital Imaging and Communications in Medicine (DICOM)

The DICOM standard, originally developed for radiology, has been extended to support whole-slide imaging (WSI), with comprehensive definitions for tiled image pyramids, cross-reference metadata, and complex imaging workflows. DICOM Supplement 145 introduced the VL Whole Slide Microscopy Image IOD (Information Object Definition),^[29] enabling the storage of large, multi-resolution pathology images as collections of frames (tiles) within a single DICOM series. Key features include:

Multi-resolution pyramids comprise separate image object files representing each resolution level within a single directory for each slide
Compression support for JPEG and JPEG 2000 (J2K) formats
Z-planes and multi-channel imaging capabilities for depth imaging and associated metadata
Structured metadata encoding using standardized DICOM attribute tags for reproducible specimen information and acquisition parameters
Coordinate referencing system with slide-based (X, Y, Z) spatial positioning and Frame of Reference tags
Full and sparse tiling support for flexible data organization

Many clinical workflows are adopting DICOM WSI as the long-term reference format, in part due to regulatory requirements and the need for standardized interoperability across institutions and platforms.^[30]^[31]

Nevertheless, several challenges remain. DICOM WSI encoding can introduce computational overhead due in part to its multipart structure, making real-time display within viewers more challenging than proprietary or performance-focused formats.^[32] Furthermore DICOM has been criticized as a monolithic file specification that imposes architectural constraints and restricting technological choices.^[33]^[34] Conversion tools such as wsidicomizer, Orthanc WSI server, and PixelMed’s TIFFToDicom facilitate migration from legacy formats to DICOM-compliant archives.^[35]^[36]

OME-TIFF (Open Microscopy Environment TIFF)

OME-TIFF is an open, extensible format developed by the Open Microscopy Environment (OME) consortium to address both the data and metadata needs of modern bioimaging.^[37] It extends the classic TIFF structure-with its widespread library and tool support-by embedding structured OME-XML metadata within TIFF tags, particularly within the ImageDescription field of the first Image File Directory (IFD). The file specification is made available by the OME consortium.^[38] Key features include:

Pyramidal multi-resolution support: OME-TIFF uses TIFF’s SubIFD mechanism (Tag 330) to represent image pyramids, supporting rapid image navigation. Each level may use its own compression (JPEG, JPEG 2000, etc.), and BigTIFF extensions are supported for large file sizes.
Metadata extensibility: OME-XML is a schema-centered metadata language capable of representing imaging provenance (e.g., microscope, lens, detector), acquisition modality, coordinate mapping, and experiment details. Attributes such as channel wavelengths, Z index, timepoint, and objective are standardized for interoperability.
Multi-dimensionality: Supports Z-stacks, time series, multichannel imaging, and 3D/4D data organization.
OME ecosystem: The Bio-Formats Java library and OMERO server provide read/write and management capabilities for OME-TIFF images, with desktop analysis supported by QuPath, Fiji/ImageJ, and others617.
Validation and archiving: The format is openly specified, making it suitable for long-term research data stewardship, regulatory submission, and reproducible AI workflows.

OME-TIFF is widely adopted in research and academic pathology where comprehensive metadata and analysis pipeline integration are prioritized. The Bio-Formats Java library and OMERO server provide read/write and management capabilities, with desktop analysis supported by QuPath and Fiji/ImageJ.^[39] Open-source code facilitates widespread access to OME-TIFF files across platforms.^[40]

Iris File Extension (IFE)

The Iris File Extension (IFE) is a modern binary container format for whole-slide images developed at the University of Michigan. Built upon contemporary performance serialization technology and incorporating familiar TIFF concepts, IFE addresses performance limitations of existing formats through optimized architecture designed specifically for high-speed file operations and efficient local slide rendering. The format enables rapid random-access reads and massively multithreaded file encoding writes while maintaining compatibility through validation routines.^[41] The specification is made completely open under a Creative Commons Attribution-No Derivative 4.0 license.^[42]

Key features include:

Memory-mapped binary tile offset tables enabling direct random access without additional indexing
Modern compression support for legacy JPEG and contemporary AVIF image formats
Binary-encoded metadata segments storing slide descriptors, acquisition parameters, and spatial coordinates
File-level and section-level integrity validation with early corruption detection
Embedded annotation blocks for native serialization of regions of interest
Multi-threaded parallel write architecture allowing simultaneous tile encoding and flushing
Open-source Ecosystem: Iris Codec (with WebAssembly module) and Iris RESTful server provide performance networking deployments and tools for format conversion and file updates.^[43]
Versioned headers with feature flags ensuring backward and forward compatibility

IFE embeds structured metadata and annotations alongside pixel data, enabling integrated validation and region-of-interest handling within a single file. The format's architecture decouples metadata parsing from pixel retrieval and uses memory-mapped regions to achieve optimized tile access times compared to traditional WSI formats. Cross-platform implementations are available in C++,^[44] Python,^[45] and JavaScript, with specification validation tools supporting widespread adoption.^[46]

Remove ads

Challenges

Summarize

Perspective

Digital pathology has been approved by the FDA for primary diagnosis.^[47] The approval was based on a multi-center study of 1,992 cases in which whole-slide imaging (WSI) was shown to be non-inferior to microscopy across a wide range of surgical pathology specimens, sample types and stains.^[48] As of mid 2025, approximately 50 digital pathology AIs have been cleared for primary diagnostic use (CE-IVD / CE-IVDR) in the EU.^[49] While there are advantages to WSI when creating digital data from glass slides, when it comes to real-time telepathology applications, WSI is not a strong choice for discussion and collaboration between multiple remote pathologists.^[50] Furthermore, unlike digital radiology where the elimination of film made return on investment (ROI) clear, the ROI on digital pathology equipment is less obvious. The strongest ROI justification includes improved quality of healthcare, increased efficiency for pathologists, and reduced costs in handling glass slides.^[51]

Validation

Validation of a digital microscopy workflow in a specific environment (see above) is important to ensure high diagnostic performance of pathologists when evaluating digital whole-slide images. There are different methods that can be used for this validation process.^[52] The College of American Pathologists has published a guideline with minimal requirements for validation of whole slide imaging systems for diagnostic purposes in human pathology.^[53]

Remove ads

Potential

Summarize

Perspective

Trained pathologists traditionally view tissue slides under a microscope. These tissue slides may be stained to highlight cellular structures. When slides are digitized, they are able to be shared through tele-pathology and are numerically analyzed using computer algorithms. Algorithms can be used to automate the manual counting of structures, or for classifying the condition of tissue such as is used in grading tumors. They can additionally be used for feature detection of mitotic figures, epithelial cells, or tissue specific structures such as lung cancer nodules, glomeruli, or vessels, or estimation of molecular biomarkers such as mutated genes, tumor mutational burden, or transcriptional changes.^[54]^[55]^[56] This has the potential to reduce human error and improve accuracy of diagnoses. Digital slides can be easily shared, increasing the potential for data usage in education as well as in consultations between expert pathologists. Multiplexed imaging (staining multiple markers on the same slide) allows pathologists to understand finer distribution of cell-types and their relative locations.^[57] An understanding of the spatial distribution of cell-types or markers and pathways they express, can allow for prescription of targeted drugs or build combinational therapies in a personalized manner.

Remove ads

References

Loading content...

Digital pathology

History

Environment

Scan