Yunhai Tong

Latest

Rethinking Vector Field Learning for Generative Segmentation
RecTok: Reconstruction Distillation along Rectified Flow
Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation
Guiding Visual Autoregressive Models through Spectrum Weakening
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Direct Preference Optimization for LLM-Enhanced Recommendation Systems
Vmoba: Mixture-of-block attention for video diffusion models
Cyberv: Cybernetics for test-time scaling in video understanding
Mixed-r1: Unified reward perspective for reasoning capability in multimodal large language models
Muddit: Liberating generation beyond text-to-image with a unified discrete diffusion model
Conditional panoramic image generation via masked autoregressive modeling
Mmada: Multimodal large diffusion language models
Training-free heterogeneous graph condensation via data selection
Towards scalable and deep graph neural networks via noise masking
Training-free diffusion acceleration with bottleneck sampling
Decouple and track: Benchmarking and improving video diffusion transformers for motion transfer
Diffusion-sharpening: Fine-tuning diffusion models with denoising trajectory sharpening
Sa2va: Marrying sam2 with llava for dense grounded understanding of images and videos
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
DreamRelation: Bridging Customization and Relation Generation
You Can't Ignore Either: Unifying Structure and Feature Denoising for Robust Graph Learning
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
RLRF4Rec: Reinforcement Learning from Recsys Feedback for Enhanced Recommendation Reranking
SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Customizing graph neural networks using path reweighting
Characteristic-Aware Time-Series Representation Learning for Unsupervised Anomaly Detection
Collaborative Multi-Task Representation for Natural Language Understanding
Motionbooth: Motion-aware customized text-to-video generation
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
Hgamlp: Heterogeneous graph attention mlp with de-redundancy mechanism
VG4D: Vision-Language Model Goes 4D Video Recognition
Explore In-Context Segmentation via Latent Diffusion Models
Towards robust referring image segmentation
Sfnet: Faster and accurate semantic segmentation via semantic flow
Towards open vocabulary learning: A survey
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
Towards language-driven video inpainting via multimodal large language models
Mitigating Semantic Confusion from Hostile Neighborhood for Graph Active Learning
Dst-det: Simple dynamic self-training for open-vocabulary object detection
Multiple Connectivity Views for Session-based Recommendation
Betrayed by captions: Joint caption grounding and generation for open vocabulary instance segmentation
Panopticpartformer++: A unified and decoupled view for panoptic part segmentation
Convolution-enhanced evolving attention networks
Label-efficient interactive time-series anomaly detection
TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers
Panoptic-partformer: Learning a unified model for panoptic part segmentation
Polyphonicformer: Unified query learning for depth-aware video panoptic segmentation
Fashionformer: A simple, effective and unified baseline for human fashion segmentation and recognition
Query Learning of Both Thing and Stuff for Panoptic Segmentation
Improving Video Instance Segmentation via Temporal Pyramid Routing
Enhancing self-attention with knowledge-assisted attention maps
Heat-RL: Online Model Selection for Streaming Time-Series Anomaly Detection
Graph pointer neural networks
Ts2vec: Towards universal representation of time series
Video k-net: A simple, strong, and unified baseline for video segmentation
BoundarySqueeze: Image Segmentation as Boundary Squeezing
Improving BERT with Self-Supervised Attention
End-to-end video object detection with spatial-temporal transformers
Dynamic Dual Sampling Module For Fine-Grained Semantic Segmentation
Fast and accurate scene parsing via bi-direction alignment networks
Global aggregation then local distribution for scene parsing
Competence-based Curriculum Learning for Multilingual Machine Translation
Towards efficient scene understanding via squeeze reasoning
Evolving attention with residual convolutions
Customizing Graph Neural Networks using Path Reweighting
PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
Enhanced boundary learning for glass-like object segmentation
Fast and Accurate Scene Parsing via Bi-Direction Alignment Networks
Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees
Multivariate time-series anomaly detection via graph attention network
AutoADR: Automatic model design for ad relevance
Spectral temporal graph neural network for multivariate time-series forecasting
Boundary content graph neural network for temporal action proposal generation
Improving semantic segmentation via decoupled body and edge supervision
Semantic flow for fast and accurate scene parsing
Ladabert: Lightweight adaptation of bert through hybrid model compression
Gated fully fusion for semantic segmentation
Spherical criteria for fast and accurate 360 object detection
Textnas: A neural architecture search space tailored for text representation
Customized graph embedding: tailoring embedding vectors to different applications
Global aggregation then local distribution in fully convolutional networks
Dual graph convolutional network for semantic segmentation
Flow2seg: Motion-aided semantic segmentation
Reprojection R-CNN: A Fast and Accurate Object Detector for 360 deg Images
A cooperative multi-agent reinforcement learning framework for resource balancing in complex logistics network
Fast parallel path concatenation for graph extraction
Gvos: a general system for near-duplicate video-related applications on storm
A Structural Based Community Similarity Algorithm and Its Application in Scientific Event Detection
Netrating: Credit risk evaluation for loan guarantee chain in china
StroMAX: Partitioning-based scheduler for real-time stream processing system
Ranking scientific articles by exploiting citations, authors, journals, and time information