Based on the end-to-end modeling of Deep Peak2, the audio stream is recognized as text in real time, and the start and end time of each sentence are returned, which is suitable for long sentence speech input, audio and video subtitles, conferences and other scenarios
Based on the end-to-end modeling of Deep Peak2, more than 100000 hours of data training, multi sampling rate and multi scene acoustic modeling, the near-field Mandarin recognition accuracy reached 98%
Multilingual recognition
Support Mandarin and Chinese recognition with slight accent;Support English recognition
Intelligent language processing
Use large-scale data sets to train language models, intelligently correct the intermediate results of recognition, and intelligently match appropriate punctuation marks according to speech content understanding and pause,.!?
Multiple calling methods
It supports WebSocket API, Android, iOS, Linux SDK, and can be called on a variety of operating systems and device terminals. It is fast and easy to use
Real time identification of audio stream at millisecond level
The response time of the first packet is millisecond level, and the intermediate text results are displayed in real time to quickly identify the audio stream
Text recognition results support timestamp
The text results returned by recognition are provided with time stamps to show the start and end time of VAD segmentation sentences, which is convenient for function development
Application scenarios
Real time voice input
Live video subtitles
Speech subtitles on the same screen
Real time meeting minutes
Classroom audio recognition
Real time voice input
Accurate and efficient voice input, hands free, real-time display of speech content on the screen, and smooth chat
Characteristic advantages
Leading recognition effect
Based on the deep peak2 end-to-end modeling, multi sampling rate and multi scene acoustic modeling, the near-field Mandarin recognition accuracy reaches 98%
Support multi device terminal
It supports WebSocket API mode, Android, iOS, Linux SDK mode calls, and can be used by multiple operating systems and multiple device terminals
Stable and efficient service
Enterprise level stable service guarantee, proprietary cluster carrying large traffic concurrent, efficient and flexible, stable service
Self service optimization of model
Chinese Putonghua model can be self trained with zero code on the voice self training platform. Uploading text corpus can effectively improve the recognition accuracy of business vocabulary by 5-25%
Product pricing
Common scene model
Audio and video scene model
Prepaid hourly package
Applicable to enterprises with predictable call duration
Free duration
10 hours
term of validity
1 year
Concurrency
50 (capacity expansion is supported)
Service stability
99.9%
technical support
7 * 24-hour response
1000 hours
one thousand and eight hundred
element
Buy Now
Pay after call duration
Applicable to enterprises that cannot estimate the call duration
Free duration
10 hours
Concurrency
50 (capacity expansion is supported)
Service stability
99.9%
technical support
7 * 24-hour response
Price
three
Yuan/Hour
Subscription payment
model training
It is applicable to customers who need large-scale training on speech recognition language model due to inaccurate noun recognition in professional fields
Upload text and vocabulary to train
Professional evaluation and precise improvement
The model is automatically launched for exclusive use
TRAIN NOW
Pricing Instructions
The free call duration of this product is available upon activation. After use, you can choose two billing methods: hourly package prepayment or hourly post payment. The generated billing call duration has priority to consume the hourly package quota, and the excess part is charged by the time length
Charging standard
Start using identification service
Register to receive the free product experience package