speech synthesis
The voice synthesis product with high fidelity and flexible configuration can open up the closed loop of human-computer interaction and make the application sound lifelike. A variety of voice colors can be selected, and the functions of adjusting voice speed, intonation, volume, etc. are provided. For private deployment (local deployment software) requirements and business problems, please contact: nls_support@service.aliyun.com
Activities and promotion

Product specification

speech synthesis
Long text speech synthesis
Speech synthesis 30000 times
A maximum of 300 words of text can be submitted at one time, which can be converted into natural and smooth voice, and can adjust the speed, intonation, volume and other functions
30000 times
More than 70 kinds of pronunciation experience
Intelligent customer service, voice assistant, outbound call notification and other scenarios
0 yuan experience for new customers
one hundred .00 /Since
Speech synthesis 1000 thousand times
A maximum of 300 words of text can be submitted at one time, which can be converted into natural and smooth voice, and can adjust the speed, intonation, volume and other functions
1000 thousand times
More than 70 kinds of pronunciation experience
Intelligent customer service, voice assistant, outbound call notification and other scenarios
0 yuan experience for new customers
one thousand and eight hundred .00 /Since
Speech synthesis 10000 thousand times
A maximum of 300 words of text can be submitted at one time, which can be converted into natural and smooth voice, and can adjust the speed, intonation, volume and other functions
10000 thousand times
More than 70 kinds of pronunciation experience
Intelligent customer service, voice assistant, outbound call notification and other scenarios
0 yuan experience for new customers
fifteen thousand .00 /Since
Speech synthesis 84000 thousand times
A maximum of 300 words of text can be submitted at one time, which can be converted into natural and smooth voice, and can adjust the speed, intonation, volume and other functions
84000 thousand times
84000 thousand times
182000 times
300000 times
More than 70 kinds of pronunciation experience
Intelligent customer service, voice assistant, outbound call notification and other scenarios
0 yuan experience for new customers
one hundred thousand .00 /Since
Product experience
Polyphonic
continuity
withdraw
redo

Volume:
fifty
Tone:
zero
Speech speed:
zero
reference resources Introduction to SSML Markup Language , learn how to use SSML markup language to enrich compositing effects

Product advantages

modern techniques
Technically, multi-level prosodic pauses are taken into consideration to achieve the goal of natural synthesis of prosodies. By comprehensively using acoustic parameters and linguistic parameters, a multiple automatic prediction model based on deep learning is established.
Multi domain coverage
A large vocabulary has been accumulated in smart home, vehicle, navigation, finance, banking, insurance, securities, operators, logistics, real estate, education and many other fields, making AliCloud voice synthesis more accurate in various fields and industries.
Natural sense of hearing
Massive audio data is used to train the pronunciation model, so that the synthetic voice is true, full, cadenced and expressive, and MOS score reaches the professional level in the industry.
Rich sound library
It has a rich timbre library, providing about 110 timbres, more standard male and female voices, soft and sweet female voices and other styles to choose from, supporting markup language (SSML) The synthesis mode, emotion, volume, speaking speed and pitch parameters also support dynamic adjustment.

Product Functions

Algorithmic capability
Knowledge-Aware Neural TTS (KAN-TTS) Speech synthesis technology converts text into speech, combines neural network and domain knowledge, and has the characteristics of accurate pronunciation, natural rhythm, high voice reproduction and strong expression
Support multilingual dialects and mixed Chinese and English broadcasting At present, it supports Japanese and many Southeast Asian languages, Cantonese, Tianjin, Hunan, Northeast and other dialects, and multiple voice models support mixed broadcasting in Chinese and English
Engineering capability
Support word level timestamp It can be used for time alignment of audio and subtitles in video dubbing, alignment of virtual image mouth, etc
Fast dynamic parameter adjustment Support dynamic adjustment of speaker, speaking speed, volume, intonation, sampling rate, multiple audio coding formats, etc; Support SSML label language; Support streaming synthesis, playing while synthesizing

Application scenarios

Intelligent customer service
Intelligent device
Navigation broadcast
News broadcast
Audio book reading
Advertising
Intelligent customer service
In customer service robot, service robot and other scenes, it is linked with voice recognition, natural language processing and other modules to open the closed-loop of human-computer interaction, achieve high-quality robot voice, and make human-computer interaction more smooth and natural.
Able to solve
Provide intelligent customer service voice synthesis capability in multiple industries and scenarios;
Improve the efficiency of answers and customer satisfaction;
Reduce the labor cost of the call center.
Recommended combination
Intelligent device
In the scene of smart home, speaker, vehicle and wearable device, the content fed back by the machine to the user is interacted through high-quality voice, and the phoneme boundary can be used to make the virtual image "live".
Able to solve
Improve the efficiency of answers and customer satisfaction;
Give intelligent devices a warm voice and a more deeply rooted image.
Recommended combination
Navigation broadcast
In navigation scenarios such as driving, walking and riding, users can choose the speaker of the broadcast content independently.
Able to solve
Ensure that navigation users can travel smoothly through voice navigation without looking at the screen;
Bring freshness and interest to users in the boring driving process;
Improve user stickiness and use frequency.
Recommended combination
News broadcast
In news and information APPs, voice synthesis can quickly generate high-quality broadcast audio, and a variety of voice colors can be adapted to various types of copywriting, which can be both calm and standard, and also lively and flexible.
Able to solve
Release the user's hands and eyes;
Provide news broadcast with various pronunciation styles;
Create a more extreme media experience.
Recommended combination
Audio book reading
The electronic teaching materials, novels and other text materials are imported into the voice synthesis engine realized by Knowledge aware Neural TTS technology in the form of text files to generate complete and repeatable audio teaching materials or audio novels for users to access at any time.
Able to solve
Use exclusive high-quality sound according to the scene;
Perfect fit for reading novels, articles and other scenes.
Recommended combination
Advertising
The digital anchor will take the goods in the live broadcast room or broadcast the advertising information in the physical store instead of the real promoters.
Able to solve
Broadcast brand and effect advertisements to attract consumers to buy;
The virtual anchor carries goods in the live broadcast room to reduce related labor costs.
Recommended combination

Customer Stories

More products and services

Tongyi Listening Comprehension
Tongyi listening supports real-time transcription and audio-video text conversion in the context of courses, meetings, interviews, etc., intelligent generation of summaries, and real-time translation to break the barriers of cross language communication. Tongyi Listening Comprehension also supports rapid marking of key information, and identification results and notes can be easily downloaded and shared with other users, allowing information to flow.
One sentence recognition
The recognition of short speech (within one minute) is suitable for short speech interaction scenarios, such as voice search, voice commands, voice short messages, etc., and can be integrated into various apps, smart appliances, smart assistants and other products.
Real time speech recognition
Real time recognition of unlimited audio streams to achieve the effect of "speaking while writing". Built in intelligent sentence breaking can provide the start and end time of each sentence. It can be used for real-time live video subtitles, real-time meeting records, real-time court hearing records, intelligent voice assistant and other scenarios.
Recording file identification
Identify the recording file uploaded by the user, complete the identification within 3 hours after uploading and return the identification text. It can be used in call center voice quality inspection, court trial database entry, meeting minutes summary, hospital medical record entry and other scenarios.

Documentation and Tools