information community file
Technical capability
Voice technology
Character recognition
Face and Human Body
Image technology
Language and knowledge
video technique

common problem

To help you speed up problem solving, we trained Baidu Brain Assistant , can support FAQ.

If some questions cannot be answered, we also expect you to Supplement and adjust , we will send gift cards, keyboard and mouse sets, small speakers, etc. to express our gratitude.

Account login

Q: What account should I use to log in?
A: You need to have a Baidu account to log in to Baidu Cloud. You can click here Register Baidu account If you have a Baidu promotion account before, you can also log in to Baidu Cloud.

Q: What should I do if I fail to receive the verification code when I register my Baidu account?
A: The verification code is not received in time due to overdue downtime, full storage information, signal network delay and other reasons. At this time, please check your mobile phone and the balance of phone charges to ensure that your mobile phone can receive messages normally. After that, please try to obtain the verification code again.

Q: Does the AI service support the use of promotion accounts?
A: Support the use of promotion accounts.

Interface call

Q: What capabilities has Baidu Brain opened up?
A: Baidu Brain is the core technology engine of Baidu AI, including vision, voice, natural language processing, knowledge mapping, deep learning and other AI core technologies and AI open platform. Baidu Brain supports all Baidu businesses internally, opens up comprehensively to the outside world, helps partners and developers, accelerates the implementation and application of AI technology, enables transformation and upgrading of all industries, and enables industrial customers.

Q: Is the request quota for each service free?
A: At present, we have set a fixed free request quota for each API service under each account to facilitate your service experience and application debugging. In the free trial phase, successful or failed calls are counted as a valid call, which will consume free test resources.

Q: Is there a limit on the request quota for each service?
A: At present, under the same account, you can view it on the console of the corresponding service. There is no limit to the number of requests for the paid service, which can be deducted immediately.

Q: What to do if QPS exceeds the limit?
A: QPS overrun can be purchased according to your needs. Detailed price list of Baidu AI technical services: https://ai.baidu.com/ai-doc/REFERENCE/hk3dwjfzo
The price list includes: free test resources, billing introduction, billing price list, cost listing, etc.

Q: What languages do the service side SDKs support?
A: At present, service side SDKs in Java, PHP, Python, C #, Node.js versions support various services. Some technologies have launched service side SDKs in C++, IOS, Android and other versions. For details, please refer to the technical service SDK documents.

Q: Is there any difference between invited test, public test and commercial interface?

A: The full name of invited test is invited test. When the AI capability engine was just launched, users were invited to test on a small scale. Public beta can only be conducted after invited beta, you need to submit Work order Apply for or conduct business consultation directly through the product page; The full name of open beta is open beta. The AI capability engine is open to all users of the platform and provides a certain amount of free calls. At the same time, user submission is supported Work order Apply for the amount of free test resources; Commercial use means that the AI capability engine is open to all users of the platform and charges a certain fee. The platform provides users with a certain amount of free tests.

Technical Q&A

Face recognition

Q: How can I input the identified image?
A: At present, the face recognition interface supports base64 encoding and image URL input.

Q: What is base64 encoding and how to provide it?
A: Base64 encoding of an image refers to encoding a pair of image data into a string, and using this string instead of the image address. You can first get the binary image, and then use Base64 format encoding. Note: The base64 encoding of the image does not include the image header (data: image/jpg; base64,).

Q: What are the requirements for the recognized image format?
A: It supports PNG, JPG, JPEG, BMP and other format pictures, but does not support GIF type dynamic picture recognition.

Q: What is the request image size limit for face service?
A: The total image data size should be limited to 10M.

Q: How many points should I take for face recognition and authentication?
A: The similarity score that can be judged as the same person is eighty The corresponding error recognition rate is about 1/10000. You can also select a more appropriate threshold according to business requirements.

Q: What is the difference between face recognition and face authentication?
A: The difference between face authentication and face recognition is that face recognition needs to specify a group in the face database to be searched; However, face authentication needs to specify a specific user ID instead of a specific group in the face database; In practical applications, face authentication requires the user or the system to input the ID first, which increases the security of authentication, but also increases the complexity. The specific interface to use depends on your business scenario.

Q: What are the face databases, user groups, users, and faces under users of face recognition?
A: You can refer to the following hierarchical relationships:

 |-Face database |-User group I |-Users 01 |-Face |-Users 02 |-Face |-Face .... .... |-User group 2 |-User group 3 |-User group four ....

Q: What are the restrictions on face database settings?
A: The restrictions are as follows:

  • Each appid corresponds to a face database, and the face databases are not interlinked between different appids
  • Under each face database, multiple user groups can be created, with the number of user groups No restrictions
  • Each user group (group) can add unlimited user_ids and unlimited faces (note: to ensure the query speed, the maximum face size in a single group is recommended to be 800 thousand )。;
  • The maximum number of faces that each user (uid) can register 20

explain: Effective time after face registration Generally within 5s Then you can perform identification or authentication operations.

explain: In order to ensure better effect of subsequent recognition, it is recommended that the registered face should be the front face of the user.

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/165

Character recognition

Q: What is the maximum concurrency limit for character recognition?
A: Most of the character recognition interfaces provide 2QPS quota when the payment is not opened. After the payment is opened, the quota will be increased to 10QPS. If you have higher concurrent requirements, you can purchase QPS overlay package for expansion. For products that have been launched but have no price, they are in test status temporarily. When the test quota is insufficient, you can Submit work order To apply, you need to provide your appid, business scenario description, required interface name and concurrency of the application in the work order.

Q: How to purchase/increase the number of calls?
A: The online billing interface allows you to directly Console Pay as you go when you purchase a package or activate it. See product price The billing interface is not online, you can Submit work order To make an application, you need to provide your appid, business scenario description, required interface name and the number of calls to the application in the work order.

Q: What are the requirements of character recognition on the format and size of uploaded pictures?
A: It supports images in JPG, JPEG, PNG, BMP, TIF, WebP and other formats, but does not support GIF type dynamic image recognition.
In general, the size of the image after base64 encoding must be less than 4M, and it is recommended not to exceed 1M; The minimum side length is not less than 15, and the maximum side length is not more than 4096. It is recommended not to exceed 1024 (images larger than 1M or the maximum side length exceeds 1024 after encoding will be compressed proportionally. It is recommended to control the size of the input image to help reduce network transmission and interface processing time). However, different functional interfaces may have different requirements for image size API documentation The description of the Image and url parameters in shall prevail.

Q: What is base64 encoding and how to provide it?
A: Base64 encoding of pictures refers to encoding a pair of picture data into a string. Various programming languages include Base64 encoding functions that can be called directly.

Note: After base64 encoding, the image header, such as (data: image/jpg; base64,), must be removed and urlencode must be performed before uploading.

Q: How to improve the accuracy and speed of recognition?
A: The accuracy of character recognition is related to lighting, background, clarity and other factors. It is recommended to upload JPG image format, and the image size is recommended to be less than 1M. The area of text to be recognized can be expanded as far as possible at the image acquisition end, and the text in the image shall be clear and legible to the human eye, with the gradient not less than 30%. At the same time, properly compressing the image size can significantly shorten the image recognition time.

Q: What languages does word recognition support?
A: Different functional interfaces support different languages. Common multilingual recognition interfaces are as follows:

Universal character recognition (standard version), universal character recognition (standard including location version) : Support simplified Chinese, traditional Chinese, English, Japanese, Korean, French, Spanish, Portuguese, German, Italian, and Russian.
Universal character recognition (high-precision version), universal character recognition (high-precision version including location version) : Support simplified Chinese, traditional Chinese, English, Japanese, Korean, French, Spanish, Portuguese, German, Italian, Russian, Danish, Dutch, Malay, Swedish, Indonesian, Polish, Romanian, Turkish, Greek, Hungarian.
Other interfaces (except for domestic special card bills) can basically support Chinese and English content recognition. See API documentation

If you have special requirements for the language recognized by the product support, you can Submit work order contact us.

Q: Does text recognition support image text recognition with rotation direction/different orientation?
A: Most of the ability of character recognition has supported the automatic correction function of image direction, which can correctly recognize rotating pictures, and can also control whether this function is enabled by controlling the parameter "detect_direction" to be true/false. If you have a problem that some rotating pictures cannot be recognized correctly in use, you can Submit work order Tell us to optimize.

Q: Does character recognition have the function of distinguishing card certificate, original bill and copy?
A: ID card identification Including risk detection function, which can distinguish the original and copy of ID card. For details, refer to API documentation If you need to distinguish the original and copy functions of other character recognition services, you can Submit work order contact us.

Q: Does the character recognition have the function of distinguishing the authenticity of cards and bills?
A: ID card identification With the functions of retake, PS and copy alarm, you can also use Facial nucleus The public security verification interface of is used to verify the authenticity and consistency of the name and ID card number. If you need to distinguish the authenticity of cards and bills in other character recognition services, you can Submit work order contact us.

Q: Can character recognition be batch recognized?
A: Not supported at the moment. A single call can only recognize a single picture, but you can make multithreaded calls within the allowable range of QPS.

Q: Can the recognition result be converted to Word or TXT?
A: The results returned after OCR extraction are in JSON format. You need to save the results in Word or TXT format through business processing.

Q: Can I add a scan box in the application interface of character recognition?
A: Baidu only provides the character recognition API interface, and the application interface can be self-developed according to your needs.

Q: Is the ID verification code supported?
A: Identification of verification code involves network security issues, and Baidu does not provide special services for verification code identification.

Q: What is the response speed of character recognition?
A: Generally, within 1s, the recognition time will be affected by the size of the picture and the number of words, but the maximum time will not exceed 7s. Once it exceeds 7s, it will automatically return a "timeout" error, and the corresponding call will not be charged.

Note: However, due to the different network conditions of data transmission and the uncontrollable range of Baidu, the actual response time you feel is Baidu model recognition time+data traffic transmission time. If a large number of cases take too long, please check the server network conditions, properly expand the bandwidth or compress the pictures before uploading, if necessary Submit work order contact us.

Q: Is it possible to call the character recognition interface online in foreign countries?
A: Yes, but the delay will be larger.

Q: Why is the result of character recognition inaccurate?
A: There are several reasons:

(1) The size of the picture is too small. The size of the picture is less than 15px, so it cannot be recognized.

(2) The picture quality is too poor, for example, the picture is too dark and the text content is unrecognizable.

(3) The text content is covered by watermarks, seals, folds, etc.

(4) The picture style does not match the interface support type. For example, ID card identification It only supports the identification of second-generation resident ID cards, but does not support the identification of passports, bank cards, etc.

(5) If there is an error code returned, please refer to Error code Troubleshoot problems.

Q: What should I do when calling the character recognition API service fails?

A: Reason for troubleshooting:

(1) Return results or Error code Find the reason.

(2) Check whether the API calling method is correct (refer to AI Access Guide Operation).

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/164

Image audit

Q: Are there any restrictions on the image format and resolution?
A: At present, the image format supports PNG, JPG, JPEG, BMP, GIF (only the first frame is reviewed), Webp, TIFF; The image should be greater than or equal to 5KB and less than or equal to 4MB after base64, and the shortest edge should be greater than or equal to 128 pixels and less than or equal to 4096 pixels.

Q: Is there a limit on the size of the picture?
A: The image URL address is requested in the form of URL. The image Url needs to do UrlEncode, and the image needs to be greater than or equal to 5kb after base64, Less than or equal to 4M, the shortest edge is greater than or equal to 128 pixels, and less than or equal to 4096 pixels.

Q: How can I input the identified image?
A: Support the transmission of Base64 encoded images or image URLs.

Q: What is base64 encoding and how to provide it?
A: Base64 encoding of an image refers to encoding a pair of image data into a string, and using this string instead of the image address. You can first get the binary image, and then use Base64 format encoding. Note: The base64 encoding of the image does not include the image header, such as (data: image/jpg; base64,).

Q: Do you support transferring pictures via URL?

A: Support. You can use this service through the image audit composite service interface.

Q: Do you support one API call to get the return results of multiple models?

A: Support. You can flexibly select the model capabilities you need through the image audit composite service interface.

Q: How to purchase image audit products online

A: You can purchase an unlimited number of image audit products through online recharge payment. Please click on the following for the specific process: https://ai.baidu.com/ai-doc/ANTIPORN/Fkp5jux3p

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/172

image recognition

Q: What images can the image recognition interface recognize?

A: The image recognition interface can support general object and scene recognition, brand logo recognition, animal recognition, plant recognition, vegetable recognition, landmark recognition, fruit and vegetable recognition, red wine recognition, currency recognition, image subject recognition, remake recognition, FMCG commodity detection, etc. Details link: https://ai.baidu.com/tech/imagerecognition

Q: Is image recognition offline?

A: Image recognition does not support offline at present, but it can be recommended EasyDL image To meet this requirement, EasyDL supports offline.

Q: What can the vehicle image recognize?

A: At present, it supports identification of vehicle type, vehicle detection, vehicle flow statistics, vehicle attribute identification, vehicle appearance damage identification, and vehicle segmentation. Details link: https://ai.baidu.com/tech/vehicle

Q: What are the requirements for customized image recognition pictures?

A: To ensure the training effect, please submit the pictures that need to be identified in the actual business as the training set as far as possible, and cover the sample pictures with different lights, angles and backgrounds as far as possible. Staff are required to assist in collecting samples, which can be found in Official website Apply in the "Cooperation Consultation" in the floating window at the lower right corner.

Q: How does customized image recognition organize samples?

A: ① Sort out the most fine-grained identification target list;
② Collect or sort out training samples. Staff are required to assist in collecting samples, which can be found in Official website Apply in the "Cooperation Consultation" in the floating window at the lower right corner.

Q: How to upload customized image recognition training samples? What are the upload requirements?

A: ① You can upload the original image to the platform, use the built-in annotation tool to complete the annotation, or directly upload the image and annotation information. The supported upload methods are flexible and diverse, including local import (image import, compressed package import, API import), support for network file import (including Baidu Cloud Boss import, shared file link import), etc.
② Uploading data needs to be named after the corresponding classification results used in the actual business scenario, and at least two classifications need to be uploaded for training (if the actual business scenario needs to identify "other" results, please upload others as a supplementary training set for classification).
③ Currently, the supported image types are png, jpg, bmp, and jpeg, and the image size is limited to 14M. The aspect ratio of the image is within 3:1, where the longest side is less than 4096px and the shortest side is more than 30px.
④ The training picture is consistent with the shooting environment of the picture to be identified in the actual scene. For example, if the picture to be identified is taken from overhead by the camera, the training picture cannot use the front image of the target downloaded from the Internet. The image of each tag needs to cover the possibility in the actual scene, such as the change of photographing angle and light intensity. The more scenes covered by the training set, the stronger the generalization ability of the model.

Q: What is the reason for the failure of customized image recognition display model training?

A: There may be the following reasons:
① The training file submitted is damaged
② There is no folder for submitting training sets (such as submitting a batch of scattered pictures)
③ The submitted compressed package cannot be decompressed (for example, the browser is closed abnormally in the middle of uploading, resulting in incomplete compressed package and incorrect compressed package format)
④ Incorrect format of submitted picture
⑤ There is only 1 category folder for training set submission
⑥ For other exceptions, please visit the "Image Recognition" section of Baidu AI Community for questions and exchanges: http://ai.baidu.com/forum/topic/list/171

Q: What if the customized image recognition training fails?

A: Training failure is usually a problem of training samples. It is recommended to check whether there are file damage, no folder, wrong image format and other problems in the uploaded compressed package by referring to the reason for training failure, and then create a new model before uploading training.

Q: How to optimize the training model?

A: ① Picture of supplementary training set
② Optimize picture quality
③ Optimize training set classification rules
④ If you have targeted tuning or questions, you can visit the "Image Recognition" section of Baidu AI Community to ask questions: http://ai.baidu.com/forum/topic/list/171

Q: Customized recognition of plane graphics is basically independent of angle and light. Do you also need 200+pictures?

A: If there is a big difference between objects, it is also OK to set dozens of pieces. If the gap is not obvious, according to our experience, each group of 200+photos will have a better recognition effect.

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/171

Image search

Q: How do I charge for image search?

A: The image search service has a certain amount of free call volume, which will be charged after the free resources are used up. If you need to pay for the use, you can purchase a number of times package or open a pay as you call volume package. For detailed price scheme, please refer to the product price document: https://ai.baidu.com/ai-doc/IMAGESEARCH/Zk3bczq54

Q: What information is used to determine similarity in image search?

A: We will analyze the similarity between the features of the retrieved images and those of the stored images. Features are more general semantics of the image, such as color, subject, composition, etc.

Q: In image search, in which scenes is the effect of similar image search ideal?

A: Various live pictures, network pictures or scenes of art works are well recognized, such as design materials, UGC content, etc. Attention should be paid to the consistency of the scene between the retrieval image and the imported original image. For example, if the imported original image is a standard advertising image (without background interference), the retrieval image should try to avoid background and other interference features, otherwise there will be noise when the model calculates the similarity of the image, resulting in inaccurate retrieval results.

Q: Can similar image search help me find the image I want in Baidu image library?

A: The image search scope is to find the target image in the image library established by Baidu AI. The search results of Baidu images are only network information and cannot be used as the search image library.

Q: Where does the self built database exist in image search?

A: The database is stored on Baidu server, and the private cloud service is inaccessible to others.

Q: When the company uses other cloud services to call Baidu's interface, must it build a database in Baidu? Can I directly call data on other ECS servers?

A: It must be stored in Baidu's server. Currently, it is not supported to directly retrieve data from other cloud services. Because the algorithm is built on Baidu server, feature extraction is required when the image library is put into storage.

Q: Does product search support partial and full image search?

A: Support searching between partial and complete images.

Q: How to create a self created database for image search

A: You can sort out the existing pictures and create a picture library according to the actual needs. Please click: http://ai.baidu.com/forum/topic/show/496543

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/170

video technique

Q: How to access the services of video content review and video cover selection?

A: Please first submit your business requirements through the "Business Cooperation" on the page, and we will contact you after receiving them, and provide test documents and interfaces.

Q: How to use the video contrast retrieval service?

A: Please first submit your business requirements through the "Cooperation Consultation" on the page, and we will contact you when we receive them.

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/173

Basic Technology of Language Processing

Q: What is the input code?
A: Currently, GBK code and UTF-8 code are supported.

Q: What are the meanings of POS tags in lexical analysis results?
A: See the table below for details API documentation

Part of speech meaning Part of speech meaning Part of speech meaning Part of speech meaning
n Common nouns f Locality noun s Locative noun t Time noun
nr name ns place name nt Organization name nw Title
nz Other proper names v Common verb vd verbal adverb vn Noun verb
a adjective ad coverb an Noun adjective d adverb
m Quantifier q classifier r pronoun p preposition
c conjunction u auxiliary word xc Other function words w punctuation

Q: How many dimensions does word vector represent?

A: We provide 1024 dimensional word vector representation, and future versions will consider providing a reduced dimension version for different scenarios.

Q: What are the restrictions on the number of texts in the Chinese DNN language model? What is the input code?

A: The maximum length is 10240 bytes, which is about 5120 Chinese characters. GBK coding input is supported, and no word segmentation is required.

Q: What if the Chinese DNN language model is mixed with English?

A: The model vocabulary contains frequently used high-frequency English words, which can also be matched.

Q: Why do many words have meaning similarity of 1?

A: Although the vocabulary of word vector is in the order of millions, it is still possible to have words that are not in the vocabulary. Words that are not in the vocabulary are uniformly mapped to OOV (out of vocabulary). Therefore, when both words in the word pair are OOV, the similarity is 1.

Q: What is the limit of short text similarity on the number of characters?

A: The maximum length is 512 bytes, which is about 266 Chinese characters, but too many or too few words will have a slight impact on the effect.

Q: How to calculate the similarity of short text, and how to deal with mixed Chinese and English?

A: The model vocabulary contains frequently used high-frequency English words, which can match the "Chinese English mixed arrangement" text in the Chinese context very well.

Q: Why does short text similarity calculation sometimes fail to return results?

A: The prerequisite for returning results is that the words in the text are included in the vocabulary. Although the model vocabulary is large (millions of words), the problem of not being in the vocabulary still occasionally occurs. When all words in the text are not in the vocabulary, no results will be obtained.

Q: Is there a limit on the length of comments input for comment extraction?

A: It is recommended that the length of input characters should not exceed 150 characters, that is, they should remain within the range of common comment characters. Theoretically, there is no limit on the length of comments, but the platform limits the string length to 10240 characters. If it exceeds the limit, it will be truncated.

Q: Can comment opinion extraction mark the text location of the mined opinion?

A: Yes, the output result contains the position of the opinion label in the original text. For example, you can mark that the service of this hotel is good.

Q: Does comment opinion extraction support user-defined dictionary upload?

A: The customized version can upload 13 industry specific customized comment glossaries, effectively improving the accuracy and recall rate of comment extraction, and supporting users to customize the "normalized tag" of comments.

Q: Can comments be uploaded and summarized in batches?

A: The interface can be used to realize this function. The interface can realize the tag extraction and polarity analysis of each comment, and multiple calls can realize the tag mining and analysis of multiple comments.

Q: What types of emotions can be analyzed by affective orientation analysis?

A: The current analysis of emotional polarity is divided into positive, negative and neutral.

Q: What are the differences between affective orientation analysis and conversational emotion recognition?

A: Conversation emotion recognition is an intuitive detection of the positive/neutral/negative language (such as: how awesome you are/how bored you are) in the user's conversation scene. Emotional analysis is more inclined to analyze the likes/dislikes expressed on an object (such as movies and books). The effect of the two is the best in the corresponding scene. Otherwise, the recognition accuracy will be affected to a certain extent.

For more questions, exchange here: https://ai.baidu.com/forum/topic/list/169

Speech Recognition and Synthesis

Q: What is the daily call limit of voice recognition and synthesis interface, and how to apply for increasing the limit?
A: There are certain test limits for voice recognition and synthesis interfaces. Completing personal real name authentication and enterprise authentication can increase the QPS limit. The detailed QPS limit can be viewed on the console. If you need a larger QPS, you can open a payment interface in the console, or further Cooperation consulting

Q: Console - advanced settings of the speech application details page, speech recognition thesaurus settings, and semantic parsing settings?
A: The offline command word recognition service can identify predefined fixed phrases in the offline environment (this function can only be used when the equipment is offline). The offline command words are applicable to such scenarios as recognizing the phonebook friends spoken by voice when disconnected, and recognizing specific voice operation commands when disconnected. Local semantic parsing can parse the recognized text, and automatically convert the recognized text into structured data, so that you can get the intention of the text (this function is independent of the network state of the device). Local semantic parsing is applicable to scenarios such as contacting friends in the address book and opening mobile phone applications with voice.

Q: What are the audio formats and sampling rates supported by the speech recognition REST API?
A: The audio format of the original PCM must conform to 16k, 8k sampling rate, 16bit bit depth, and mono. The supported formats are: pcm (uncompressed), wav (uncompressed, pcm coded), amr (compressed format), m4a (compressed format, only the extreme speed version model is supported).

Q: How long can voice recognition REST API support?
A: Recording files up to 60s long

Q: What is the speech recognition REST API? What are the precautions?
A: The speech recognition full platform REST API, which uses http to request, can be applied to any programming language that can initiate http requests on any platform. With REST API, recording, compression and uploading modules need to be developed by ourselves. And REST API speech recognition does not support semantic parsing temporarily.

Q: Is voice service free? Still need to pay
A: There are certain test limits for voice recognition and synthesis interfaces. Completing personal real name authentication and enterprise authentication can increase the QPS limit. The detailed QPS limit can be viewed on the console. If you need a larger QPS, you can open a payment interface in the console. If you need to purchase voice services, please refer to the document for detailed quotation:
Speech recognition quotation: https://ai.baidu.com/ai-doc/SPEECH/ck38lxnx8
Voice synthesis quotation: https://ai.baidu.com/ai-doc/SPEECH/Nk38y8pjq

Q: What languages does speech recognition and synthesis support?
A: Speech recognition support: Mandarin, Sichuan dialect, Cantonese, English
Voice synthesis support: Chinese and English are supported, but other languages are not supported temporarily. Please continue to pay attention to the official website in the future.

Q: SDK, code error, can't run normally?
A: Hello, please test the official demo first, test the demo, and then add your own code, which is basically usable. If there is still a problem, you can go to AI community discussion Or initiate Work order , let's take a closer look at the reasons.

Q: How to feed back problems?

  1. First, confirm whether it is a code problem. You can test our demo.
  2. You can use the Official website search , enter keywords to search documents and FAQs
  3. If there is still no answer you want, you can choose a channel for feedback:

Q: What can I do to improve the recognition accuracy of some words?
A: Hello, you can continuously improve the recognition accuracy by training the exclusive language model through EasyDL voice self training platform. See details https://ai.baidu.com/easydl/audio/ At the same time, for a small number of business noun scenarios, you can also quickly improve the accuracy of corresponding nouns through user-defined thesaurus. The entry is: console - advanced settings of speech application details page, and speech recognition thesaurus settings.

Q: What is the difference between speech recognition REST API and SDK?
A: REST API: hhtp interface, the developer uploads the recording - Baidu Voice recognizes it - the recognition result returns to the developer; SDK: Integrated code is required. Baidu Voice provides an overall solution from recording to return of recognition results.

Q: Which interface does speech recognition use to obtain audio information?
A: Android SDK: CALLBACK_EVENT_ASR_AUDIO callback or OUT_FILE parameter, set the path to be saved. In addition, you need to set ACCEPT_AUDIO_DATA to true

IOS SDK: MVoiceRecognition ClientDelegate's (void) VoiceRecognition ClientWorkStatus: (int) aStatus obj: (id) aObj; When aStatus of is EVoiceRecordationClientWorkStatusNewRecordData, aObj is NSData audio data. The data format is pcm, and the sampling rate can be obtained through VoiceRecognitionConfig. getSampleRate()/[[BDVoiceRecognitionClient sharedInstance] getCurrentSampleRate]. The audio format obtained is 16bit deep, mono.

Q: How can we improve the accuracy of speech recognition?
A: It is recommended to upload the business text training language model to improve the accuracy of speech recognition through EasyDL speech self training platform. Please click: https://ai.baidu.com/easydl/audio/ You can also customize voice recognition settings: open Baidu Open Cloud Platform and customize voice recognition settings under the currently created application. Upload identification keyword text, save and take effect.

Q: How to resolve conflicts between Baidu Voice SDK and other Baidu SDKs, or other third-party SDKs?
A: The conflict with other Baidu SDKs is generally due to the use of the same basic library, galaxy.jar. Please check whether the jar package is imported repeatedly; The conflict with other third-party SDKs is generally due to the non-uniform architecture of the so library. Please ensure that the so library in the armeabi/armeabi-v7a/x86/mips directory under the project libs directory is consistent. If the consistency cannot be guaranteed, generally all SDKs can only use the so library of the armeabi architecture.

Q: The delay time of opening the voice recognition function for the first time is long. How to adjust it?
A: The first delay is usually caused by permission verification. You can call the interface in advance: (int)verifyApiKey:(NSString )apiKey withSecretKey:(NSString )secretKey; To verify. When the voice is turned on for the first time, there is no need to send a verification request, thus reducing the delay of voice recognition start.

Q: How to reduce the size of the installation package for ASR Android?
A: If you want to save the size of the installation package, you can only use the armeabi directory, and the performance loss is small.
If only online identification function is required, only 2 so files are required.

Q: If you want to specify the pronunciation of a word in speech synthesis, how can you achieve it?
A: The voice synthesis interface supports the user's independent pronunciation by adding phonetic symbols after the text to be synthesized. For example, if you want to specify the pronunciation of "chong" for the heavy words in "accent", you need to change the synthetic text to "chong2", where 2 means two tones, and you can adjust the tone according to the number changes, 1 corresponds to one tone, 2 corresponds to two tones, and 3 corresponds to three tones, 4 corresponds to 4 sounds.

Q: What languages and timbres does speech synthesis support?
A: At present, speech synthesis supports Mandarin Chinese broadcasting and English broadcasting, and the timbre supports male voice, female voice and child voice, including Duya, Duxiaoyao, Dubowen, Dumiduo, Duxiaotong, Duxiaomeng and Duxiaojiao.

Q: How to Implement Formal Authorization of Speech Synthesis SDK
A: You can test the demo first. Authorization -- Please set the APPID, bundle name, and AK SK first. It is enough to ensure that the online call is successful for the first time. Specific steps:

  1. For applications on the official voice website, please confirm the package name. The package name of Android sdk demo is com.baidu.tts.sample
  2. After startup, check whether the offline resource file exists and is readable. Then the online call is successful, and the subsequent use continues.

Q: Does the SDK automatically pause the playback of voice synthesis when answering a call? Or do you need to handle it yourself?
A: Android sdk does not handle audio focus. You need to use the pause and resume methods to process logical operations by yourself.

Q: How to save recognized audio in speech synthesis
A: Hello, if you want to save the synthesized audio, take the Python SDK as an example, the path to generate the file can be customized. As long as your local path is correct, the script will be automatically generated after running. You don't need to save it. If you have a duplicate name, it will be overwritten. You need to install the Python environment locally, and then call it according to the document: http://ai.baidu.com/docs#/TTS -Online-Python-SDK/top

Q: Prompt "Unable to execute dex: Multiple dex files define Lcom/baidu/android/common/logging/Log" error?
A: You have integrated other Baidu SDKs in your application, causing a public library conflict. Please delete the galaxy_lite.jar in the voice SDK.

Q: How to call speech synthesis when reading novels, news broadcasts and other scenes need to synthesize large paragraphs of text continuously?
A: You can segment text according to punctuation and call it by sentence, which will have a faster synthesis speed At the same time, you can start to cache the synthesis of the second sentence while playing the first sentence. After the first sentence is played, you can directly play the cached audio, which can have strong consistency. Baidu speech synthesis SDK also provides the method of batch text synthesis. For details, you can view the technical documents.

Q: In the field of online car hailing, express delivery, intelligent hardware, etc., if the stability of network signals cannot be guaranteed (frequent network outages), how should they be called?
In this case, it is recommended to use the pure offline speech synthesis SDK provided by Baidu Voice.
If you use the Android SDK, TTSMODE_OFFLINE mode is recommended. If the MIX_MODE_HIGH_SPEED_SYNTHESIZE mode is used (online is preferred under WIFI 4G 3G 2G, offline synthesis under other network conditions. If online connection to Baidu server fails or timeout is 1.2s, switch to offline synthesis).
If you do not want to consume traffic in the mobile network, or have a stronger demand for response speed, please choose another mode according to your business needs. If you use the iOS SDK, it is recommended to use the default settings. If you need to adjust the online request timeout, you can set it through the BDS_SYNTHESIZER_PARAM_ONLINE_REQUEST_TIMEOUT parameter. Specific methods can be referred to Technical Documentation

Q: Baidu Android tts synthesis SDK conflicts with map navigation SDK. How to resolve
A: For the Android navigation SDK, the built-in tts are a complete and independent aar file in the navigation SDK, and all related dependent jars are in the aar. As long as the aar is not introduced into the project, there will be no conflict with the external tts SDK. See http://lbsyun.baidu.com/index.php?title=android -navsdk/guide/projectConfiguration

Q: What to do if some characters are not read correctly during speech synthesis
A: You can use the method of polyphonic character tagging to mark your own pronunciation. You can add phonetic symbols after the text you want to synthesize. For example, if you want to specify the pronunciation of "chong" for the heavy words in "accent", you need to change the compound text to "chong2", where 2 means 2 tones. You can adjust the tone according to the number changes, 1 corresponds to 1 tone, 2 corresponds to 2 tones, and 3 corresponds to 3 tones, 4 corresponds to 4 sounds.
At the same time, you can feed back the case of wrong pronunciation to the QQ group or work order. The feedback format is as follows:
[restapi] [pronunciation] [online] [error type]
Classification of error types: polyphone, prosody, retroflex, TN (error in conversion of numbers and special symbols), English badcase, others
Feedback example:
[rest api] [standard female voice] [online] [polyphone "line"]
Text: A line of egrets goes up into the blue sky
Description: Line read as (xing)
Expectation: a line of (hang) egrets in the sky
More questions are discussed here: https://ai.baidu.com/forum/topic/list/166

Previous
Financial Reference
Next
Error code