default search action
Xiaofei Wang 0009
Person information
- affiliation: Microsoft, One Microsoft Way, Redmond, WA, USA
Other persons with the same name
- Xiaofei Wang (aka: Xiao-Fei Wang) — disambiguation page
- Xiaofei Wang 0001 — Tianjin University, Tianjin Key Laboratory of Advanced Networking, China (and 2 more)
- Xiaofei Wang 0002 — Air Force Engineering University, Institute of Aeronautics Engineering, Xian, China
- Xiaofei Wang 0003 — Chengdu Normal University, Department of Computer Science, China
- Xiaofei Wang 0004 — Beihang University, School of Electronic and Information Engineering, Beijing, China
- Xiaofei Wang 0005 — University of Chinese Academy of Science, National Computer Network Intrusion Protection Center, Beijing, China
- Xiaofei Wang 0006 — Tsinghua University, Beijing, China (and 1 more)
- Xiaofei Wang 0007 — Johns Hopkins University, Baltimore, USA (and 1 more)
- Xiaofei Wang 0008 — University of California, Berkeley, CA, USA
- Xiaofei Wang 0010 — Heilongjiang University, College of Electrics Engineering, Harbin, China (and 1 more)
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c27]Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
Diarist: Streaming Speech Translation with Speaker Diarization. ICASSP 2024: 10866-10870 - [i33]Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng:
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like. CoRR abs/2402.07383 (2024) - [i32]Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng:
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations. CoRR abs/2404.06690 (2024) - [i31]Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng:
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation. CoRR abs/2405.17809 (2024) - [i30]Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS. CoRR abs/2406.05699 (2024) - [i29]Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda:
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS. CoRR abs/2406.18009 (2024) - [i28]Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech. CoRR abs/2407.12229 (2024) - [i27]Jiaqi Li, Dongmei Wang, Xiaofei Wang, Yao Qian, Long Zhou, Shujie Liu, Midia Yousefi, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yanqing Liu, Junkun Chen, Sheng Zhao, Jinyu Li, Zhizheng Wu, Michael Zeng:
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation. CoRR abs/2409.04016 (2024) - 2023
- [c26]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech Separation with Large-Scale Self-Supervised Learning. ICASSP 2023: 1-5 - [c25]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition. ICASSP 2023: 1-5 - [c24]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. ICASSP 2023: 1-5 - [c23]Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang, Yiming Wang, Shujie Liu, Zhuo Chen, DeLiang Wang, Michael Zeng:
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks. ICASSP 2023: 1-5 - [c22]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR. ICASSP 2023: 1-5 - [c21]Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen, Xiaofei Wang, Takuya Yoshioka:
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach. INTERSPEECH 2023: 3502-3506 - [i26]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. CoRR abs/2308.06873 (2023) - [i25]Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
DiariST: Streaming Speech Translation with Speaker Diarization. CoRR abs/2309.08007 (2023) - 2022
- [c20]Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang:
Personalized speech enhancement: new models and Comprehensive evaluation. ICASSP 2022: 356-360 - [c19]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang:
Picknet: Real-Time Channel Selection for Ad Hoc Microphone Arrays. ICASSP 2022: 921-925 - [c18]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. ICASSP 2022: 6027-6031 - [c17]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-Neural Beamformer for Continuous Speech Separation. ICASSP 2022: 6032-6036 - [c16]Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang:
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. ICASSP 2022: 6062-6066 - [c15]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR. ICASSP 2022: 8082-8086 - [c14]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. INTERSPEECH 2022: 521-525 - [c13]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. INTERSPEECH 2022: 3774-3778 - [c12]Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. INTERSPEECH 2022: 3814-3818 - [i24]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang:
PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays. CoRR abs/2201.09586 (2022) - [i23]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. CoRR abs/2202.00842 (2022) - [i22]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. CoRR abs/2203.16685 (2022) - [i21]Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. CoRR abs/2204.03232 (2022) - [i20]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. CoRR abs/2209.04974 (2022) - [i19]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating realistic speech overlaps improves multi-talker ASR. CoRR abs/2210.15715 (2022) - [i18]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech separation with large-scale self-supervised learning. CoRR abs/2211.05172 (2022) - [i17]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition. CoRR abs/2211.05564 (2022) - [i16]Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka:
Breaking trade-offs in speech separation with sparsely-gated mixture of experts. CoRR abs/2211.06493 (2022) - 2021
- [c11]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. ASRU 2021: 296-303 - [c10]Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng:
Continuous Speech Separation with Ad Hoc Microphone Arrays. EUSIPCO 2021: 1100-1104 - [c9]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. ICASSP 2021: 6503-6507 - [c8]Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings. ICASSP 2021: 6763-6767 - [c7]Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka:
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement. Interspeech 2021: 2686-2690 - [c6]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. Interspeech 2021: 3430-3434 - [c5]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. Interspeech 2021: 4413-4417 - [c4]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. SLT 2021: 809-816 - [c3]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. SLT 2021: 833-840 - [i15]Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings. CoRR abs/2101.01853 (2021) - [i14]Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng:
Continuous Speech Separation with Ad Hoc Microphone Arrays. CoRR abs/2103.02378 (2021) - [i13]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. CoRR abs/2103.16776 (2021) - [i12]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. CoRR abs/2104.02128 (2021) - [i11]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. CoRR abs/2107.02852 (2021) - [i10]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR. CoRR abs/2110.03151 (2021) - [i9]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. CoRR abs/2110.05745 (2021) - [i8]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-neural beamformer for continuous speech separation. CoRR abs/2110.06428 (2021) - [i7]Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang:
Personalized Speech Enhancement: New Models and Comprehensive Evaluation. CoRR abs/2110.09625 (2021) - [i6]Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang:
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. CoRR abs/2110.15430 (2021) - 2020
- [c2]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. INTERSPEECH 2020: 36-40 - [c1]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. INTERSPEECH 2020: 2797-2801 - [i5]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. CoRR abs/2003.12687 (2020) - [i4]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers. CoRR abs/2006.10930 (2020) - [i3]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. CoRR abs/2008.04546 (2020) - [i2]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. CoRR abs/2011.02921 (2020) - [i1]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription. CoRR abs/2011.03110 (2020)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-15 19:35 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint