Multimedia Semantic Analytics Lab
Multimedia Semantic Analytics Lab
Home
People
Publications
Contact
Jianzong Wu
Latest
Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation
Vmoba: Mixture-of-block attention for video diffusion models
Muddit: Liberating generation beyond text-to-image with a unified discrete diffusion model
Decouple and track: Benchmarking and improving video diffusion transformers for motion transfer
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
DreamRelation: Bridging Customization and Relation Generation
Motionbooth: Motion-aware customized text-to-video generation
Towards robust referring image segmentation
Towards open vocabulary learning: A survey
Towards language-driven video inpainting via multimodal large language models
Betrayed by captions: Joint caption grounding and generation for open vocabulary instance segmentation
Cite
×