Multimedia Semantic Analytics Lab

We are the Multimedia Semantic Analytics Lab (MSALab) at Peking University, led by Prof. Yunhai Tong.

Our mission is to build intelligent systems that understand and generate meaning across language, vision, and video. We combine strong theoretical foundations with practical, open-world applications in large-scale AI.

Highlights

  • We conduct cutting-edge research in multimodal learning, visual perception, video understanding, and language semantics, pushing AI systems toward deeper and more reliable semantic intelligence.

  • Current core directions:

    • Multimodal large language models and agentic reasoning.
    • Text-to-image and text-to-video generation and editing.
    • Efficient and trustworthy large language models.
  • We welcome self-motivated research interns and PhD applicants with strong mathematical and engineering backgrounds, especially those interested in multimodal AI, LLMs, and generative modeling.

  • Explore our recent work on the Publications page.