default search action
Zhaoheng Ni
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j3]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. J. Mach. Learn. Res. 25: 97:1-97:52 (2024) - [c20]Gaël Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest N. Iandola, Vikas Chandra:
Stack-and-Delay: A New Codebook Pattern for Music Generation. ICASSP 2024: 796-800 - [c19]Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li:
An Empirical Study on the Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement. ICASSP 2024: 1001-1005 - [c18]Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest N. Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra:
On the Open Prompt Challenge in Conditional Audio Generation. ICASSP 2024: 5315-5319 - [c17]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. ICASSP 2024: 11831-11835 - [c16]Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra:
Folding Attention: Memory and Power Optimization for On-Device Transformer-Based Streaming Speech Recognition. ICASSP 2024: 11901-11905 - [i22]Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li:
An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement. CoRR abs/2401.09686 (2024) - [i21]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. CoRR abs/2406.02560 (2024) - [i20]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. CoRR abs/2406.04660 (2024) - [i19]Gaël Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching. CoRR abs/2407.03648 (2024) - [i18]Hao Shi, Yuan Gao, Zhaoheng Ni, Tatsuya Kawahara:
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition. CoRR abs/2409.00815 (2024) - 2023
- [j2]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing. J. Open Source Softw. 8(91): 5403 (2023) - [j1]Qiquan Zhang, Xinyuan Qian, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah, Haizhou Li:
A Time-Frequency Attention Module for Neural Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 31: 462-475 (2023) - [c15]Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411 - [c14]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. ASRU 2023: 1-9 - [c13]Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu:
Torchaudio-Squim: Reference-Less Speech Quality and Intelligibility Measures in Torchaudio. ICASSP 2023: 1-5 - [c12]Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li:
Ripple Sparse Self-Attention for Monaural Speech Enhancement. ICASSP 2023: 1-5 - [c11]William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408 - [d1]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310). Zenodo, 2023 - [i17]Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li:
Ripple sparse self-attention for monaural speech enhancement. CoRR abs/2305.08541 (2023) - [i16]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. CoRR abs/2305.13516 (2023) - [i15]William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023) - [i14]Yangyang Shi, Gaël Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest N. Iandola, Yang Liu, Vikas Chandra:
Enhance audio generation controllability through representation similarity regularization. CoRR abs/2309.08773 (2023) - [i13]Gaël Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest N. Iandola, Vikas Chandra:
Stack-and-Delay: a new codebook pattern for music generation. CoRR abs/2309.08804 (2023) - [i12]Xinhao Mei, Varun Nagaraja, Gaël Le Lan, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra:
FoleyGen: Visually-Guided Audio Generation. CoRR abs/2309.10537 (2023) - [i11]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis:
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. CoRR abs/2310.17864 (2023) - [i10]Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest N. Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra:
On The Open Prompt Challenge In Conditional Audio Generation. CoRR abs/2311.00897 (2023) - 2022
- [c10]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair:
Torchaudio: Building Blocks for Audio and Speech Processing. ICASSP 2022: 6982-6986 - [c9]Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson, Haizhou Li:
Time-Frequency Attention for Monaural Speech Enhancement. ICASSP 2022: 7852-7856 - [c8]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022: 9201-9205 - [c7]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. INTERSPEECH 2022: 5458-5462 - [i9]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge. CoRR abs/2202.12298 (2022) - [i8]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. CoRR abs/2207.09514 (2022) - 2021
- [c6]Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shi-Xiong Zhang, Dong Yu, Michael I. Mandel:
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation. SLT 2021: 817-824 - [i7]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi:
TorchAudio: Building Blocks for Audio and Speech Processing. CoRR abs/2110.15018 (2021) - [i6]Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson, Haizhou Li:
Time-Frequency Attention for Monaural Speech Enhancement. CoRR abs/2111.07518 (2021) - 2020
- [c5]Zhaoheng Ni, Michael I. Mandel:
Mask-Dependent Phase Estimation for Monaural Speaker Separation. ICASSP 2020: 7269-7273 - [i5]Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shi-Xiong Zhang, Dong Yu, Michael I. Mandel:
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation. CoRR abs/2011.09162 (2020) - [i4]Félix Grèzes, Zhaoheng Ni, Viet Anh Trinh, Michael I. Mandel:
Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks. CoRR abs/2012.01576 (2020) - [i3]Zhaoheng Ni, Félix Grèzes, Viet Anh Trinh, Michael I. Mandel:
Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks. CoRR abs/2012.02191 (2020) - [i2]Félix Grèzes, Zhaoheng Ni, Viet Anh Trinh, Michael I. Mandel:
Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement. CoRR abs/2012.03388 (2020)
2010 – 2019
- 2019
- [i1]Zhaoheng Ni, Michael I. Mandel:
Onssen: an open-source speech separation and enhancement library. CoRR abs/1911.00982 (2019) - 2018
- [c4]Zhaoheng Ni, Rutuja Ubale, Yao Qian, Michael I. Mandel, Su-Youn Yoon, Abhinav Misra, David Suendermann-Oeft:
Unusable Spoken Response Detection with BLSTM Neural Networks. ISCSLP 2018: 255-259 - [c3]Weicheng Ma, Kai Cao, Zhaoheng Ni, Peter Chin, Xiang Li:
Sound Signal Processing with Seq2Tree Network. LREC 2018 - 2017
- [c2]Zhaoheng Ni, Ahmet Cem Yuksel, Xiuyan Ni, Michael I. Mandel, Lei Xie:
Confused or not Confused?: Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks. BCB 2017: 241-246 - [c1]Xiuyan Ni, Weicheng Ma, Zhaoheng Ni, Robert M. Haralick:
A Sep2Tree Model for Recognizing Synthetic Bach Chorales. ICMC 2017
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-14 22:05 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint