Yuan Gong

Research Scientist
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
G442, 32 Vassar St, Cambridge MA 02139
Phone: (574) 401-0833
Email: yuangong@mit.edu
Bio
I am a research scientist at the MIT Computer Science and Artificial Intelligence Lab (CSAIL). Before I joined MIT, I got my Ph.D. in computer science from the University of Notre Dame, supervised by Dr. Christian Poellabauer. During the 2019 Summer, I was an applied scientist intern working on clinical text mining in the AWS Comprehend Medical team, supervised by Mohammed Khalilia and Parminder Bhatia. Before coming to Notre Dame, I got my B.Sc. degree in Electrical Engineering (Biomedical Engineering Major) from Fudan University in 2015. My research advisors were Dr. Yuanyuan Wang (on ultrasound image denoising) and Dr. Yuedong Xu (on network science). My current research interest is computational speech and audio signal analysis, which includes the following topics: speech-based healthcare applications, audio-visual multi-modality learning, and general audio event recognition. [CV] [Google Scholar]
*Nearly all my papers are incorrectly linked to UESTC Professor Yuan Gong. That is not me, I have never worked or studied at UESTC.
Education
2020.7 Ph.D., Computer Science and Engineering, University of Notre Dame, IN, USA (GPA: 4.0/4.0)
2015.7 B.Sc., Electrical Engineering (Biomedical Engineering Major), Fudan University, Shanghai, China. (GPA Rank: 1/15, First Prize Scholarship)
Employment
2023.8 - Research Scientist, Massachusetts Institute of Technology, Cambridge, USA
2020.8 - 2023.7 Postdoc Research Associate, Massachusetts Institute of Technology, Cambridge, USA
2015.8 - 2020.7 Graduate Research Assistant, University of Notre Dame, Notre Dame, USA
2019.5 - 2019.8 Applied Scientist Intern, Amazon Web Service, Seattle, USA
2014.6 - 2015.7 Undergraduate Research Assistant, Fudan University, Shanghai, China
2012.7 - 2012.8 Intern, Philips Healthcare, Shanghai, China
Publications
- Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, and James Glass, "Joint Audio and Speech Understanding", Proceedings of the 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2023), Taipei, December 2023. [Paper][Interactive Demo]
- Yuan Gong, Sameer Khurana, Leonid Karlinsky, and James Glass, "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers", Proceedings of the 24th Conference of the International Speech Communication Association (Interspeech 2023), Dublin, Ireland, August 2023. [Paper][Code][Interactive Demo][Poster]
- Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, and James Glass, "Contrastive Audio-Visual Masked Autoencoder", Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, May 2023 (ICLR 2023, notable-top-25% paper). [Paper][Code][Video][Slides][Poster][MIT News]
- Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, and James Glass, "UAVM: Towards Unifying Audio and Visual Models", IEEE Signal Processing Letters, 2022. [Paper][Code]
- Nauman Dawalatabad, Yuan Gong, Sameer Khurana, Rhoda Au, and James Glass, "Detecting Dementia from Long Neuropsychological Interviews", Proceedings of findings of the 2022 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP 2022), Abu Dhabi, December 2022. [Paper]
- Yuan Gong, Jin Yu, and James Glass, "Vocalsound: A Dataset For Improving Human Vocal Sounds Recognition", Proceedings of the 47th International Conference on Acoustics, Speech, & Signal Processing (ICASSP 2022), Singapore, May 2022. [Paper][Dataset&Code][Video][Slides]
- Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, and James Glass, "Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment", Proceedings of the 47th International Conference on Acoustics, Speech, & Signal Processing (ICASSP 2022), Singapore, May 2022. [Paper][Code][Video][Slides][Blog in Chinese]
- Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, and James Glass, "SSAST: Self-Supervised Audio Spectrogram Transformer", Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022), Vancouver, Canada, February-March 2022. [Paper][Code][Slides]
- Yuan Gong, Yu-An Chung, and James Glass, "AST: Audio Spectrogram Transformer", Proceedings of the 22nd Conference of the International Speech Communication Association (Interspeech 2021), Brno, Czech Republic, August-September 2021. [Paper][Code][Talk][Blog in Chinese]
- Yuan Gong, Yu-An Chung, and James Glass, "PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation", IEEE Transactions on Audio, Speech and Language Processing, 2021. [Paper][Code][Video][Slides][Blog in Chinese]
- Yuan Gong, Jian Yang, and Christian Poellabauer, "Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method", IEEE Signal Processing Letters, 2020. [Paper][Code]
- Bryan Xia, Yuan Gong, Yizhe Zhang, and Christian Poellabauer, "Second-order Non-local Attention Networks for Person Re-identification", Proceedings of the 2019 International Conference on Computer Vision (ICCV), Seoul, Korea, October-November 2019. [Paper][Blog]
- Yuan Gong, Jian Yang, Jacob Huber, Mitchell MacKnight, Christian Poellabauer, "ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems", Proceedings of the 20th Conference of the International Speech Communication Association (Interspeech 2019), Graz, Austria, September 2019 (best student paper award nomination). [Paper][Dataset]
- Yuan Gong, Boyang Li, Christian Poellabauer, and Yiyu Shi, "Real-time Adversarial Attacks", Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, August 2019. [Paper][Code][Media]
- Yuan Gong and Christian Poellabauer, "Deep Obfuscation: Precise Masking of Sensitive Information to Protect Against Machine Learning Adversaries (Poster)", Proceedings of the 2018 Annual Computer Security Applications Conference Poster Session, San Juan, Puerto Rico, December 2018.
- Yuan Gong and Christian Poellabauer, "Crafting Adversarial Examples For Speech Paralinguistics Applications", Proceedings of the DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security (DYNAMICS) Workshop, San Juan, Puerto Rico, December 2018. [Paper]
- Yuan Gong and Christian Poellabauer, "Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models", Proceedings of the 19th Conference of the International Speech Communication Association (Interspeech 2018), Hyderabad, India, September 2018. [Paper]
- Yuan Gong, Kevin Shin, and Christian Poellabauer, "Improving LIWC Using Soft Word Matching (Poster)", Proceedings of the 9th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB), Washington, DC, August-September 2018. [Paper]
- Yuan Gong, Hasini Yatawatte, Christian Poellabauer, Sandra Schneider, and Susan Latham, "Automatic Autism Spectrum Disorder Detection Using Everyday Vocalizations Captured by Smart Devices", Proceedings of the 9th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB), Washington, DC, August-September 2018. [Paper]
- Yuan Gong and Christian Poellabauer, "Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues", Proceedings of the 27th International Conference on Computer Communications and Networks (ICCCN), Hangzhou, China, July-August 2018. [Paper]
- Yuan Gong and Christian Poellabauer, "An Overview of Vulnerabilities of Voice Controlled Systems", Proceedings of the 1st International Workshop on Security and Privacy for the Internet-of-Things (IoTSec), Orlando, FL, April 2018. [Paper]
- Yuan Gong and Christian Poellabauer, "Topic Modeling Based Multi-modal Depression Detection", Proceedings of the 7th Audio/Visual Emotion Challenge and Workshop (AVEC) in conjunction with ACM Multimedia (ACM-MM), Mountain View, CA, October 2017 (depression challenge winner). [Paper]
- Yuan Gong and Christian Poellabauer, "Continuous Assessment of Children's Emotional States using Acoustic Analysis", Proceedings of the 5th IEEE International Conference on Healthcare Informatics (ICHI), Park City, UT, August 2017. [Paper]
- Yuan Gong, Jin Cao, Zehui Luo, Guohui Zhou. "基于 MSP430F5529 及 CC2540 的智能型低功耗心电监测仪" (A Smart Low-Power-Consumption ECG Monitor Based on MSP430F5529 and CC2540), Chinese Journal of medical instrumentation 39.4, 2015 (project won 2014 TI national biomedical device design contest). [Paper]
Preprint
- Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, and James Glass, "Listen, Think, and Understand". [Paper][Interactive Demo]
- Yuan Gong, Sameer Khurana, Andrew Rouditchenko, and James Glass, "CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification". [Paper]
Awards
- ICASSP 2023 Outstanding Reviewer
- INTERSPEECH 2019 Best Student Paper Award Nomination
- Depression Detection Challenge Winner, the 7th ACM Multimedia Audio/Visual Emotion Challenge and Workshop (AVEC 2017)
- IJCAI, ISCA, ICHI, NSF Travel Grant
- Outstanding Graduate of Fudan University (2015), Fudan First Prize Scholarship (Top 3%, 2014), Outstanding Student of Dept. of Information Technology (2013), Outstanding Student of Fudan University (2012)
Invited Talks and Guest Lectures
- Contrastive Audio-Visual Masked Autoencoder. IBM, 7/28/2023.
- Large Language Models that Listen. Takeda, 5/30/2023; Signify, 7/21/2023.
- Introduction of Audio Spectrogram Transformer - Architecture, Training, and Pre-training. Mitsubishi Electric Research Laboratories, 6/8/2022; ByteDance, 6/14/2022; Adobe, 7/12/2022.
- Introduction of Audio Spectrogram Transformer - Architecture, Training, and Pre-training. AI Time. 5/26/2022. [video in Mandarin][slides]
- General Audio Processing. MIT 6.345/HST.728 Spoken Language Processing (Guest Lecture, Sole Instructor). 4/19/2022.
- Audio Spectrogram Transformer. MIT Embodied Intelligence Seminar. 10/14/2021. [video]
- Audio Spectrogram Transformer for Audio Scene Analysis. ISCA SIGML Seminar. 6/16/2021. [video][slides]
- Win the cat and mouse game: ensuring the security of the speech processing systems to real world threats. University of Notre Dame CSE60641 (Guest Lecture, Sole Instructor). 10/31/2019. [slides]
- Speech Processing: Machine Learning Approaches, Novel Applications, and New Security Concerns. University of Notre Dame CSE60641 (Guest Lecture, Sole Instructor). 9/20/2018. [slides]
Contact
Please feel free to reach out (yuangong@mit.edu) if you have any questions about my work. I do not use WeChat. As a research scientist, I am not involved in student/researcher recruitment at CSAIL, please contact a PI for such inquiries.