China Telecom Artificial Intelligence Research Institute (TeleAI) recently released the first large-scale speech recognition model in the industry that supports the free mixing of 30 dialects - Xingchen Super Multi dialect Speech Recognition Large Model, which can simultaneously recognize and understand more than 30 dialects, such as Cantonese, Shanghai dialect, Sichuan dialect, Wenzhou dialect, etc. It is the largest speech recognition model supporting the most dialects in China at present.
It is reported that China Telecom Artificial Intelligence Research Institute has built a high-quality dialect database of more than 30 kinds and more than 300000 hours. The "distillation+expansion" joint training algorithm adopted by the R&D team can solve the problem of collapse of pre training under the conditions of large-scale multi scene data sets and large-scale parameters, and achieve stable training of 80 layer models of 1B parameters.
China Telecom said that the Star Speech Large Model is the first open source speech recognition large model based on discrete speech representation in the industry. Through the new modeling paradigm of "from voice to token to text", the speech transmission bit rate during reasoning is significantly reduced.
According to China Telecom, the Star Voice big model has been opened to the outside world and has been applied to Fujian, Jiangxi, Guangxi, Beijing, Inner Mongolia and other places of China Telecom Wanhao intelligent customer service pilot applications. By accessing the Star model, intelligent customer service can quickly understand 30 dialects, and handle about 2 million calls per day, greatly improving service efficiency and user experience.