š Welcome to my website!
My name is Kai (Alan) Wang. Iām a Ph.D candidate at The Edward S. Rogers Sr. Department of Electrical and Computer Engineering (ECE), University of Toronto, supervised by Prof. Dimitrios Hatzinakos. I also work closely with Prof. Yapeng Tian on Audio-Visual Learning and Generation.
My research interests lie in Multimodal Learning (Image/Video/Audio), Generative Models, and Multimodal LLM. āā
If you want to collaborate with me on related research projects, please feel free to email me.
š„ News
- 2024.09: š One paper is accepted by NeurIPS 2024 (Spotlight).
- 2024.09: š One paper is accepted by EMNLP 2024 (Main).
- 2024.06: š One paper is appeared by ArXiv.
- 2024.05: š One co-first author paper is accepted by Pattern Recognition Letter.
- 2024.04: š One first-author paper is accepted by CVPR 2024.
- 2023.12: š One first-author paper is accepted by ICASSP 2024.
- 2023.07: š One first-author paper is accepted by APSIPA 2023.
- 2022.09: š Starting my Ph.D study at the University of Toronto.
š Selected Publications
( * equal contribution)

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian.
Under Review
- We design an efficient audio-visual diffusion transformer generate high-quality, realistic videos with both visual and audio tracks.

MMLU-Pro: A more robust and challenging multi-task language understanding benchmark
Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen.
NeurIPS 2024 (Spotlight)

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen
EMNLP 2024 (Main)

Kai Wang,Yapeng Tian, Dimitrios Hatzinakos.
CVPR 2024 Workshop
- We propose a Spatial-Temporal-Global Cross-Modal Adaptation (STG-CMA) to gradually equip the frozen ViTs with the capability for learning audio-visual representation.

Alireza Esmaeilzehi*, Ensieh Khazaei*, Kai Wang*, Navjot Kaur Kalsi, Pai Chet Ng, Huan Liu, Yuanhao Yu, Dimitrios Hatzinakos, Konstantinos Plataniotis.
Pattern Recognition Letter
- We propose a novel dataset for the task of human activity recognition, in which the labels are specified for the working environments.

Kai Wang, Dimitrios Hatzinakos.
ICASSP 2024 (Oral)
- We propose a novel parameter-efficient scheme called Mixture-of-Modality-Adaptations (MoMA) for audio-visual action recognition.

SEformer: Dual-Path Conformer Neural Network is a Good Speech Denoiser
Kai Wang, Dimitrios Hatzinakos.
APSIPA 2023 (Oral)
- We propose the SEformer, an efficient dual-path conformer neural network for speech enhancement.

Cptnn: Cross-parallel transformer neural network for time-domain speech enhancement
Kai Wang, Bengbeng He and Wei-Ping Zhu.
IWAENC 2022

CAUNet: Context-Aware U-Net for Speech Enhancement in Time Domain
Kai Wang, Bengbeng He and Wei-Ping Zhu.
ISCAS 2021

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain
Kai Wang, Bengbeng He and Wei-Ping Zhu.
ICASSP 2021
š Honors and Awards
- 2024: School of Graduate Studies (SGS) Conference Grant, University of Toronto
- 2022 - Present: Edward S. Rogers Sr. Graduate Scholarships, University of Toronto
- 2022 - Present: Research Fellowship, University of Toronto
- 2021: Conference and Exposition Award, Concordia University
- 2015: Meritorious Award of International Mathematical Contest in Modeling
š Services
- Conference Reviewer: ICLR, ACM MM, CVPRW, ICASSP, ICME, ISCAS
- Journal Reviewer: Systems, and Signal Processing (CSSP), Speech Communication
š§āš« Teaching
Winter (2023, 2024), Fall(2023, 2024): ECE421 Introduction to Machine Learning
© Kai Wang | Last updated: Oct. 5, 2024 | Theme by Yi Ren