MultiverSeg enables users to rapidly segment new biomedical imaging datasets. The MultiverSeg network takes as input an image to segment and user interactions, along with a context set of previously segmented image-segmentation pairs (left).
As the user completes more segmentations, those images and segmentations become additional inputs to the model, populating the context set. As the context set of labeled images grows, the number of interactions required to achieve an accurate segmentation decreases (right).
Medical researchers and clinicians often need to perform novel segmentation tasks on a set of related images. Existing methods for segmenting a new dataset are either interactive, requiring substantial human effort for each image, or require an existing set of manually labeled images.
We introduce a system, MultiverSeg, that enables practitioners to rapidly segment an entire new dataset without requiring access to any existing labeled data from that task or domain. Along with the image to segment, the model takes user interactions such as clicks, bounding boxes or scribbles as input, and predicts a segmentation. As the user segments more images, those images and segmentations become additional inputs to the model, providing context. As the context set of labeled images grows, the number of interactions required to segment each new image decreases.
We demonstrate that MultiverSeg enables users to interactively segment new datasets efficiently, by amortizing the number of interactions per image to achieve an accurate segmentation. Compared to using a state-of-the-art interactive segmentation method, using MultiverSeg reduced the total number of scribble steps by 53% and clicks by 36% to achieve 90 Dice on sets of images from unseen tasks.
The MultiverSeg network (left) takes as input a stack of target image inputs and a context set of image-segmentation pairs. The target image inputs include the target image to segment, optional user interactions, and a previous predicted segmentation if available.
The architecture is comprised of an encoder-decoder structure similar to a UNet. We use a CrossBlock mechanism (right) with additional normalization layers to interact the features of the target image inputs with the features of the context set inputs throughout the network.
MultiverSeg's interactive segmentation performance improves as more images are segmented and added to the context set, demonstrating the model is able to use information from the context set to improve its predictions.
We show predictions from MultiverSeg and baselines for random examples given a context set of 10 previously segmented examples from the same unseen task and one correction click or scribble.
If you find our work or any of our materials useful, please cite our paper: