Natural Language Processing

  • 2021-03-18
  • Yinung Chen

The Natural Language Processing Center of Yang Ming Chiao Tung University is led by Professor Sin-Horng Chen. The team members include: Professor Yih-Ru Wang, Professor Shaw-Hwa Hwang, Professor Yuan-Fu Liao, etc. There are also 7 post-doctoral researchers and 30 master researchers. The research center is established at 9F.-3, No. 23, Sec. 1, Chang''''an E. Rd., Zhongshan Dist., Taipei City, the research center office has a total of 100 square meters of research space.
     The main research directions of the research center include: Auto Speech Recognition (ASR), Speech Synthesis (Text to Speech, TTS), and Natural Language Processing (NLP).
    The research team currently implements the project of the Ministry of Economic Affairs for a total of four years (107/10/1-111/9/30), grants 60 million per year, produces 6-10 million technology transfers per year, and produces 6 patent applications per year.
     The research team''''s research on speech recognition technology includes: Chinese, English, and Taiwanese speech recognition research. The research team has collected more than 10,000 hours of Mandarin, English, and Taiwanese corpora with corresponding texts. Currently in Mandarin speech recognition, the voice recognition rate in conferences is close to 93%, and the phone voice recognition rate has also reached 85%. Currently, the technology has been applied to: real-time subtitles for presidential election debates. The application of the press conference of the CDC Central Epidemic Center of the Ministry of Health and Welfare was awarded a certificate of appreciation by the President. This application is to subtitle the press conference of Minister Chen Shi-Zhong since February of last year (109), so that the hearing impaired can understand the content of the press conference at a glance, and the average correct rate of real-time subtitles has reached 93%, which is considered a successful application.
     The research team''''s research on speech synthesis technology includes: Chinese, English, and Taiwanese speech synthesis research and development. At present, some media have adopted this synthesis technology as a virtual anchor.
    In the understanding and response part, the research team has collected tens of billions of Chinese articles after more than 30 years of research, and used them for grammatical analysis, word segmentation research, and AI learning. A total of more than 120,000 Chinese thesauruses were generated. This 120,000 Chinese thesaurus and AI language model will be of great help to future speech recognition and even robotic dialogue systems.

 

Real-time subtitles technology for press conference of CDC was awarded a certificate of appreciation by the President

Real-time subtitles for debate of President election

​Real-time subtitles for press conference of CDC
 
Media Exposure