Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most …
We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled …
Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges. One major challenge …
Session-based recommendation (SBR), which makes the next-item recommendation based on previous anonymous actions, has drawn increasing attention. The last decade has seen multiple deep learning-based modeling choices applied on SBR successfully, …
Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts for each task. The core insight beyond these methods is to maximize the mutual …
In this work, we focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories. Previous approaches have relied on massive caption datasets and complex pipelines to establish …
Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and part segmentation into one task. Previous work mainly utilizes separated approaches to handle thing, stuff, and part predictions individually without performing any shared …
The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously. The previous work solves this task by extending the existing panoptic segmentation …
Human fashion understanding is one crucial computer vision task since it has comprehensive information for real-world applications. This focus on joint human fashion segmentation and attribute recognition. Contrary to the previous works that …
Starting from DETR, query based detection and segmentation methods achieve comparable results as previous works with a simplified and elegant pipeline. In this work, a novel, simple and unified baseline, named QueryPanSeg, is proposed for panoptic …