MultiverSeg

Scalable Interactive Segmentation of Biomedical Imaging Datasets
with In-Context Guidance

Hallee E. Wong MIT CSAIL & MGH

Jose Javier Gonzalez Ortiz Databricks Mosaic Research

John Guttag MIT CSAIL

Adrian V. Dalca MIT CSAIL & HMS, MGH

ICCV 2025

MultiverSeg

Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

Hallee E. Wong MIT CSAIL & MGH

Jose Javier Gonzalez Ortiz Databricks Mosaic Research

John Guttag MIT CSAIL

Adrian V. Dalca MIT CSAIL & HMS, MGH

ICCV 2025

Overview

MultiverSeg enables users to rapidly segment new biomedical imaging datasets. The MultiverSeg network takes as input an image to segment and user interactions, along with a context set of previously segmented image-segmentation pairs (left).

As the user completes more segmentations, those images and segmentations become additional inputs to the model, populating the context set. As the context set of labeled images grows, the number of interactions required to achieve an accurate segmentation decreases (right).

Abstract

Medical researchers and clinicians often need to perform novel segmentation tasks on a set of related images. Existing methods for segmenting a new dataset are either interactive, requiring substantial human effort for each image, or require an existing set of manually labeled images.

We introduce a system, MultiverSeg, that enables practitioners to rapidly segment an entire new dataset without requiring access to any existing labeled data from that task or domain. Along with the image to segment, the model takes user interactions such as clicks, bounding boxes or scribbles as input, and predicts a segmentation. As the user segments more images, those images and segmentations become additional inputs to the model, providing context. As the context set of labeled images grows, the number of interactions required to segment each new image decreases.

We demonstrate that MultiverSeg enables users to interactively segment new datasets efficiently, by amortizing the number of interactions per image to achieve an accurate segmentation. Compared to using a state-of-the-art interactive segmentation method, using MultiverSeg reduced the total number of scribble steps by 53% and clicks by 36% to achieve 90 Dice on sets of images from unseen tasks.

Method

The MultiverSeg network (left) takes as input a stack of target image inputs and a context set of image-segmentation pairs. The target image inputs include the target image to segment, optional user interactions, and a previous predicted segmentation if available.

The architecture is comprised of an encoder-decoder structure similar to a UNet. We use a CrossBlock mechanism (right) with additional normalization layers to interact the features of the target image inputs with the features of the context set inputs throughout the network.

Evaluation

We evaluate different approaches to segmenting an entire new biomedical datasets. We compare MultiverSeg to an interactive segmentation approach (ScribblePrompt) as well as combining off-the-shelf models for in-context and interactive segmentation (SP+UVS).

MultiverSeg's interactive segmentation performance improves as more images are segmented and added to the context set, demonstrating the model is able to use information from the context set to improve its predictions.

Visualization

We show predictions from MultiverSeg and baselines for random examples given a context set of 10 previously segmented examples from the same unseen task and one correction click or scribble.

examples of predictions from multiverseg and baselines

Citation

If you find our work or any of our materials useful, please cite our paper:

@article{wong2024multiverseg,
  title={MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance},
  author={Hallee E. Wong and Jose Javier Gonzalez Ortiz and John Guttag and Adrian V. Dalca},
  journal={arXiv preprint arXiv:2412.15058},
  year={2024},
}

MultiverSeg

Scalable Interactive Segmentation of Biomedical Imaging Datasetswith In-Context Guidance

ICCV 2025

MultiverSeg

Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

ICCV 2025

Overview

Abstract

Method

Evaluation

Visualization

Citation

Scalable Interactive Segmentation of Biomedical Imaging Datasets
with In-Context Guidance