U-SAM:

Repurposing SAM for User-Defined Semantics Aware Segmentation

1University of California, Riverside, 2Samsung Research America
*Work done while at UCR

Problem Overview: We propose a novel and flexible pipeline, U-SAM, to tackle the challenging problem of generating pixel-level semantic annotations for user-specified object classes without any manual supervision. Given only a set of object class names, U-SAM can produce accurate semantic segmentation masks on any user-provided image data. By enhancing SAM with semantic region recognition capabilities and leveraging synthetic images generated by Stable Diffusion or web crawled images for the desired object classes, our approach addresses a highly complex segmentation problem with robust generalization.

Abstract

The Segment Anything Model (SAM) excels at generating precise object masks from input prompts but lacks semantic awareness, failing to associate its generated masks with specific object categories. To address this limitation, we propose U-SAM, a novel framework that imbibes semantic awareness into SAM, enabling it to generate targeted masks for user-specified object categories. Given only object class names as input from the user, U-SAM provides pixel-level semantic annotations for images without requiring any labeled/unlabeled samples from the test data distribution. Our approach leverages synthetically generated or web crawled images to accumulate semantic information about the desired object classes. We then learn a mapping function between SAM's mask embeddings and object class labels, effectively enhancing SAM with granularity-specific semantic recognition capabilities. As a result, users can obtain meaningful and targeted segmentation masks for specific objects they request, rather than generic and unlabeled masks. We evaluate U-SAM on PASCAL VOC 2012 and MSCOCO-80, achieving significant mIoU improvements of +17.95% and +5.20%, respectively, over state-of-the-art methods. By transforming SAM into a semantically aware segmentation model, U-SAM offers a practical and flexible solution for pixel-level annotation across diverse and unseen domains in a resource-constrained environment.

Method

overall

Given the list of user-defined target categories \(\mathcal{C}\), we use Stable Diffusion to generate a synthetic single-object image dataset which is encoded by SAM's image encoder and a uniformly spaced grid of \(\mathbf{d}\) points are generated across the image to prompt SAM. The image and point embeddings are passed into a transformer decoder, the output mask embeddings \(\mathbf{m}_i\) (here, \(\mathbf{m}_i \in \mathbb{R}^{d \times 1024}\), corresponding to the \(\mathbf{d}\) masks predicted by SAM) of which are used to train a classifier head \(\theta\) to predict objects using a Multiple Instance Learning (MIL) setup and uncertainty losses.

 

Results

qualitative_results

Qualitative Results:The original image, the ground truth (GT) mask, all of the SAM generated masks overlayed on top of each other, and U-SAM predicted masks are shown respectively in the four columns. For the GT mask and U-SAM predicted masks, the colors indicate the class labels, while random colors were used for the SAM masks since it does not provide class-labels.

granularity-specific

Qualitative results with changed granularity: The third column shows the U-SAM predictions on PASCAL classes, and the fourth column shows the predictions when the granularity level has been changed to categorize "dog", "cat", and "sheep" classes as a single "animals" class. The colors represent the different class labels: Brown: "sheep", Violet: "dog", Dark Brown: "cat", Red: "animals", Pink: "person". GC: Granularity Changed.

quant_results

Quantitative Results obtained by U-SAM.

Authors

Rohit Kundu

Rohit Kundu

Sudipta Paul

Sudipta Paul

Arindam Dutta

Arindam Dutta

Amit K. Roy-Chowdhury

Amit K. Roy-Chowdhury

BibTeX

@inproceedings{kundu2025repurposing,
    title={Repurposing SAM for User-Defined Semantics Aware Segmentation},
    author={Kundu, Rohit and Paul, Sudipta and Dutta, Arindam and Roy-Chowdhury, Amit K.},
    booktitle={CVPR workshops},
    year={2025}
}

Copyright: CC BY-NC-SA 4.0 © Rohit Kundu | Last updated: 05 April 2025 | Website credits to Nerfies