Su Yuhang, A 22nd master's student from the College of Information and Science and Technology at Beijing University of Chemical Technology, has made another breakthrough in the field of intelligent music retrieval. After publishing the paper "Audio Retrieval Method AMG-Embedding" as the first author at the CCF-A-level conference ACM Multimedia 2024, Its latest research achievement, MIDI-Zero, has been accepted by the CCF-A-level conference ACM SIGIR 2025. Within just one year, Su Yuhang has continuously published papers at top international academic conferences, demonstrating his outstanding research and innovation capabilities, as well as the research strength and talent cultivation level of the School of Information in the field of AI music retrieval. The supervisors of the two papers are both Associate Professor Hu Wei and Professor Zhang Fan.
MIDI-Zero:A MIDI-driven Self-Supervised Learning Approach for Music Retrieval
Mimi-zero is a brand-new self-supervised learning framework, focusing on music content retrieval and covering core subtasks such as audio recognition, audio matching and version recognition. Unlike traditional methods that rely on audio signals or spectrograms to extract features, MIMI-Zero operates entirely based on MIDI representations. Its greatest highlight lies in the fact that there is no need for external training data. All training data is automatically generated according to predefined task rules, completely getting rid of the reliance on labeled datasets or external music libraries. Mimi-zero is not only applicable to symbolic music data, but also can seamlessly process audio tasks through the music transcription model. A large number of experiments show that MIDI-Zero has achieved excellent performance in multiple CBMR subtasks. This innovative method simplifies the feature extraction process, successfully Bridges the gap between audio and symbolic music representations, and provides a flexible and efficient solution for music retrieval.
AMG-Embedding:a Self-Supervised Embedding Approach for Audio Identification
AMG-Embedding focuses on the audio retrieval task, aiming to accurately retrieve exactly matching content from a massive music library through short audio clips. Traditional fingerprint recognition methods rely on a large number of short-time fixed overlapping fragment features, resulting in high storage and computing costs. However, AMG-Embedding transforms variable-duration non-overlapping fragments into efficient embedding representations through self-supervised learning and a two-stage embedding process, changing the traditional paradigm. The experimental results show that AMG-Embedding reduces the storage requirements and retrieval time to less than 1/10 while maintaining comparable retrieval accuracy to traditional fingerprint recognition methods. This breakthrough has significantly enhanced the scalability and efficiency of the audio retrieval system.
ACM SIGIR and ACM Multimedia, as A-level conferences recognized by the China Computer Federation (CCF), represent the highest international academic level in the field of information retrieval and multimedia. The research team has successively presented their achievements at two top conferences within just one year, demonstrating their profound technical accumulation and academic leadership in the field of AI music retrieval.