Here at MUSAIC, we are primarily interested in investigating how we can build better systems for music generation and understanding.
Music Audio Generation and Synthesis
Music Audio Generation and Synthesis
People Involved
Zachary Novack
Tornike Karchkhadze
Ke Chen
Hao-Wen (Herman) Dong
Our research has investigated the field of audio generation and synthesis through innovative machine learning approaches. We have explored various methodologies for generating high-quality audio content, ranging from pioneering work with Generative Adversarial Networks (GANs) for raw-waveform audio synthesis to developing sophisticated music generation systems.
Featured Publications
Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J Bryan. 2024. "Presto! Distilling Steps and Layers for Accelerating Music Generation" arXiv:2410.05167
Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Gerard Assayag, Shlomo Dubnov. 2024. "Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model" arXiv:2409.02845
Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas Bryan. 2024. "DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation" nternational Society for Music Information Retrieval (ISMIR)
Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov. 2024. "MusicLDM: Enhancing novelty in text-to-music generation using beat-synchronous mixup strategies" International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J Bryan. 2024. "Ditto: Diffusion inference-time t-optimization for music generation" International Conference on Machine Learning (ICML)
Chris Donahue, Julian McAuley, and Miller Puckette. 2019. "Adversarial Audio Synthesis" International Conference on Learning Representations (ICLR)
Symbolic Music Processing
Symbolic Music Processing
People Involved
Jingyue Huang
Phillip Long
Hao-Wen (Herman) Dong
Ke Chen
We have explored various applications of machine learning for symbolic music processing. These include automatic composition and arrangement, as well as the construction of symbolic music datasets.
Featured Publications
Phillip Long, Zachary Novack, Taylor Berg-Kirkpatrick, Julian McAuley. 2024. "PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing" arXiv:2409.10831
Jingyue Huang, Ke Chen, Yi-Hsuan Yang. 2024. "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation" International Society for Music Information Retrieval (ISMIR)
Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, Taylor Berg-Kirkpatrick. 2023. "Multitrack music transformer" International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Hao-Wen Dong, Chris Donahue, Taylor Berg-Kirkpatrick, Julian McAuley. 2021. "Towards automatic instrumentation by learning to separate parts in symbolic multitrack music" International Society for Music Information Retrieval (ISMIR)
Natural Language Processing for Music
Natural Language Processing for Music
People Involved
Junda Wu
Xin Xu
Others among us are focused on natural language processing techniques for deeper musical understanding. These include audio-language representation learning, music captioning, and lyric generation.
Featured Publications
Nikita Srivatsan, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick. 2024. "Retrieval guided music captioning via multimodal prefixes" IJCAI 2024 (Special Track on AI, the Arts, and Creativity)
Junda Wu, Warren Li, Zachary Novack, Amit Namburi, Carol Chen, Julian McAuley. 2024. "CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation" arXiv:2410.02271
Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley. 2024. "Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation" arXiv:2407.20445
Audiovisual Learning
Audiovisual Learning
People Involved
Haven Kim
Ross Greer
Some of us are interested in enhancing musical experiences through visual communication. Our related efforts include building a robotic camera for sharing scores and investigating soundtracks within the context of films, such as documentaries.
Featured Publications
Ross Greer, Laura Fleig, Shlomo Dubnov. 2024. "Creativity and Visual Communication from Machine to Musician: Sharing a Score through a Robotic Camera" arXiv:2409.05773
Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick. 2023. "ClipSep: Learning text-queried sound separation with noisy unlabeled videos" International Conference on Learning Representations (ICLR)
Others
Others
We have also actively explored other applications of machine learning for music, such as Optical Music Recognition, source separation, and melody extraction. See our publications page for details!