Query Learning of Both Thing and Stuff for Panoptic Segmentation

Abstract

Starting from DETR, query based detection and segmentation methods achieve comparable results as previous works with a simplified and elegant pipeline. In this work, a novel, simple and unified baseline, named QueryPanSeg, is proposed for panoptic segmentation. QueryPanSeg represents both things and stuff as learnable queries separately. For thing query, we propose to encode each instance mask into compact mask vectors and perform classification, box regression and mask encoding regression simultaneously. For stuff query, we propose a residual interactive learning where each stuff query is responsible for one semantic category and performs pixel interaction via one multi-head attention layer. With this approach, instance-wise and semantically consistent properties for things and stuff can be unified in one framework. Compared with the original DETR, our approach results in a nearly 10 times shorter training schedule to converge. Compared with previous box-based and box-free methods, our proposed approach outperforms many state-of-the-art results with much simpler pipeline without handcrafted components.

Publication
2022 IEEE International Conference on Image Processing (ICIP)