Multimedia Semantic Analytics Lab

We are the Multimedia Semantic Analytics Lab (MSALab) at Peking University, led by Prof. Yunhai Tong.

Our mission is to build intelligent systems that understand and generate meaning across language, vision, and video. We combine strong theoretical foundations with practical, open-world applications in large-scale AI.

Highlights

We conduct cutting-edge research in multimodal learning, visual perception, video understanding, and language semantics, pushing AI systems toward deeper and more reliable semantic intelligence.

Our work combines strong theoretical foundations with practical, open-world applications in large-scale AI.

Current Core Directions

Multimodal large language models & agentic reasoning.
Image & video generation and editing.
Unified models.
Efficient and trustworthy large language models.

Open Positions

We welcome self-motivated research interns and PhD applicants with strong mathematical and engineering backgrounds, especially those interested in multimodal AI and generative modeling.

Explore our recent work ->

Latest News

Research updates, awards, talks, and community activities.

Paper Accepted to CVPR 2026: RecTok

Our paper “RecTok: Reconstruction Distillation along Rectified Flow” is accepted to CVPR 2026.

Dec 15, 2025 1 min read Publications

Paper Accepted to ICLR 2026: MMaDA-Parallel

Our paper “MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation” is accepted to ICLR 2026.

Nov 12, 2025 1 min read Paper Publications

Paper Accepted to ICLR 2026: Vmoba

Our paper “Vmoba: Mixture-of-block attention for video diffusion models” is accepted to ICLR 2026.

Jun 30, 2025 1 min read Publications

Paper Accepted to ICLR 2026: Muddit

Our paper “Muddit: Liberating generation beyond text-to-image with a unified discrete diffusion model” is accepted to ICLR 2026.

May 29, 2025 1 min read Paper Publications

Meet Our Team -> Browse Publications -> Join Us ->