Real time speech recognition real-time speech recognition technology Baidu AI open platform

New customers Real time speech recognition - Mandarin Chinese time limit is as low as 90% off , until sold out! >

Function introduction

Application scenarios

Characteristic advantages

Product pricing

Related recommendations

Function introduction

Technology leading and accurate identification

Based on the end-to-end modeling of Deep Peak2, more than 100000 hours of data training, multi sampling rate and multi scene acoustic modeling, the near-field Mandarin recognition accuracy reached 98%

Multilingual recognition

Support Mandarin and Chinese recognition with slight accent; Support English recognition

Intelligent language processing

Use large-scale data sets to train language models, intelligently correct the intermediate results of recognition, and intelligently match appropriate punctuation marks according to speech content understanding and pause,.!?

Multiple calling methods

It supports WebSocket API, Android, iOS, Linux SDK, and can be called on a variety of operating systems and device terminals. It is fast and easy to use

Real time identification of audio stream at millisecond level

The response time of the first packet is millisecond level, and the intermediate text results are displayed in real time to quickly identify the audio stream

Text recognition results support timestamp

The text results returned by recognition are provided with time stamps to show the start and end time of VAD segmentation sentences, which is convenient for function development

Application scenarios

Real time voice input

Live video subtitles

Speech subtitles on the same screen

Real time meeting minutes

Classroom audio recognition

Real time voice input

Accurate and efficient voice input, hands free, real-time display of speech content on the screen, and smooth chat

Characteristic advantages

Leading recognition effect

Based on the deep peak2 end-to-end modeling, multi sampling rate and multi scene acoustic modeling, the near-field Mandarin recognition accuracy reaches 98%

Support multi device terminal

It supports WebSocket API mode, Android, iOS, Linux SDK mode calls, and can be used by multiple operating systems and multiple device terminals

Stable and efficient service

Enterprise level stable service guarantee, proprietary cluster carrying large traffic concurrent, efficient and flexible, stable service

Self service optimization of model

Chinese Putonghua model can be self trained with zero code on the voice self training platform. Uploading text corpus can effectively improve the recognition accuracy of business vocabulary by 5-25%

Product pricing

Common scene model

Audio and video scene model

Prepaid hourly package

Applicable to enterprises with predictable call duration

Free duration

10 hours

term of validity

1 year

Concurrency

50 (capacity expansion is supported)

Service stability

99.9%

technical support

7 * 24-hour response

1000 hours

one thousand and eight hundred

element

Buy Now

Pay after call duration

Applicable to enterprises that cannot estimate the call duration

Free duration

10 hours

Concurrency

50 (capacity expansion is supported)

Service stability

99.9%

technical support

7 * 24-hour response

Price

three

Yuan/Hour

Subscription payment

model training

It is applicable to customers who need large-scale training on speech recognition language model due to inaccurate noun recognition in professional fields

Upload text and vocabulary to train

Professional evaluation and precise improvement

The model is automatically launched for exclusive use

TRAIN NOW

Pricing Instructions

The free call duration of this product is available upon activation. After use, you can choose two billing methods: hourly package prepayment or hourly post payment. The generated billing call duration has priority to consume the hourly package quota, and the excess part is charged by the time length

Charging standard

Start using identification service

Use Now

Fast entry

AI Competency Experience Center

Develop resources

QQ Support Group

Ecology and market

common problem

Pre sales consultation

After sales intelligent assistant

Technical work order

Feedback

customer service telephone numbers
400-920-8999

Talent recruitment

Experience AI capabilities immediately Open Baidu APP "Scan"

Get the latest AI information Follow "Baidu AI" WeChat official account

QQ Support Group

Baidu Voice: five hundred and eighty-eight million three hundred and sixty-nine thousand two hundred and thirty-six

Text recognition: one billion fifty-five million six hundred and twenty-three thousand eight hundred and twenty-seven

Custom template OCR: one billion fifty-five million four hundred and two thousand seven hundred and twenty-one

Face recognition: six hundred and ninety-two million four hundred and fifty thousand eight hundred and fifty-two

Human body analysis: eight hundred and sixty million three hundred and thirty-seven thousand eight hundred and forty-eight

Content review: three hundred and seventy-five million seven hundred and sixty-five thousand one hundred and ninety-four

PaddlePaddle: seven hundred and seventy-eight million two hundred and sixty thousand eight hundred and thirty

Image recognition: three hundred and twelve million one hundred and fifty-six thousand seven hundred and eighty-two

EasyDL： six hundred and fourteen million nine hundred and fifty-one thousand two hundred and thirty-nine

Image search: one billion sixty-seven million two hundred and seventy-six thousand one hundred and fifty-four

Video analysis: six hundred and thirty-two million four hundred and seventy-three thousand one hundred and fifty-eight

Baidu AR: four hundred and seventy-two million eighty-one thousand one hundred and nineteen

Natural language: one billion fifty-one million four hundred and thirty-six thousand five hundred and fourteen

UNIT： one billion seventy-four million four hundred and ten thousand one hundred and eighty-nine

Baidu Translate: two hundred and fourteen million eight hundred and fifty-seven thousand seven hundred and six

Image effect enhancement: one billion ninety-two million three hundred and thirty-eight thousand eight hundred and twenty-nine

Data intelligence: six hundred and fifty million five hundred and ninety-six thousand eight hundred and twenty-nine

Knowledge map: six hundred and fifty-five million eight hundred and fifty-four thousand seven hundred and eighty-six

DuerOS： six hundred and four million five hundred and ninety-two thousand and twenty-three

Baidu AI open platform: two hundred and twenty-four million nine hundred and ninety-four thousand three hundred and forty

Intelligent writing: seven hundred and forty-three million nine hundred and twenty-six thousand five hundred and twenty-three

EdgeBoard： one billion sixty million six hundred and twenty-three thousand three hundred and fifty-two

Voice self training platform: six hundred and eighty-six million two hundred and sixty-seven thousand five hundred and twenty-one

Far field voice development kit: two hundred and ten million ninety-three thousand two hundred and four

Cooperation consulting

Pre sales consultation

Fill in your business needs, and the exclusive account manager will contact you as soon as possible to provide one-on-one consulting services

After sales intelligent assistant

Intelligent diagnosis to quickly solve the use problem

Contact Sales

For more information, please call 400-920-8999 to 1

Experience AI

Web end to AI Competency Experience Center

The mobile terminal opens Baidu APP "Scan"