Universal character recognition (including position version with high precision)

Update time ： 2024-05-10

Interface description

It provides multi scene, multi language, high-precision whole picture text detection and recognition services, supports rare character recognition, and supports 25 language recognition. Compared with general character recognition (including location information version), this product has higher precision, but the recognition time will be slightly longer.

Online debugging

You can visit Sample Code Center Debug the interface in , you can perform signature verification, view the request content and return results of online calls, and automatically generate sample code.

Request Description

Request Example

HTTP method: POST

Request URL: https://aip.baidubce.com/rest/2.0/ocr/v1/accurate

URL parameter:

parameter	value
access_token	Access_token obtained through API Key and Secret Key, refer to“ Access Token acquisition ”

The headers are as follows:

parameter	value
Content-Type	application/x-www-form-urlencoded

Place the request parameters in the body. The details of the parameters are as follows:

Request Parameters

parameter	Required	type	Optional value range	explain
image	And url/pdf_file/ofd_file	string	-	For image data, the size of urlencode after base64 encoding and urlencode shall not exceed 10M, the shortest side shall be at least 15px, the longest side shall be at most 8192px, and jpg/jpeg/png/bmp format is supported priority : image>url>pdf_file>ofd_file. When the image field exists, the url, pdf_file, and ofd_file fields become invalid
url	And image/pdf_file/ofd_file	string	-	The image is a complete url with a length of no more than 1024 bytes. The size of the image corresponding to the url after encoding in base64 does not exceed 10M. The shortest side is at least 15px, and the longest side is at most 8192px. It supports the jpg/jpeg/png/bmp format priority : image>url>pdf_file>ofd_file, when the image field exists, the url field is invalid Please close the URL anti-theft chain
pdf_file	And image/url/ofd_file	string	-	PDF files are urlencoded after base64 encoding. It is required that the size of base64 encoding and urlencoded files should not exceed 10M, the shortest side should be at least 15px, and the longest side should be at most 8192px priority : image>url>pdf_file>ofd_file. When image and url fields exist, the pdf_file field becomes invalid
pdf_file_num	no	string	-	The corresponding page number of the PDF file that needs to be identified. When the pdf_file parameter is valid, identify the corresponding page content of the incoming page number. If not, identify the first page by default
ofd_file	And image/url/pdf_file	string	-	For OFD files, urlencode is performed after base64 encoding. The size after base64 encoding and urlencode is required to be no more than 10M, the shortest side is at least 15px, and the longest side is at most 8192px priority : image>url>pdf_file>ofd_file. When the image, url, and pdf_file fields exist, the ofd_file field becomes invalid
ofd_file_num	no	string	-	The corresponding page number of the OFD file to be identified. When the ofd_file parameter is valid, identify the corresponding page content of the incoming page number. If not, the default is to identify the first page
language_type	no	string	auto_detect CHN_ENG ENG JAP KOR FRE SPA POR GER ITA RUS DAN DUT MAL SWE IND POL ROM TUR GRE HUN THA VIE ARA HIN	Recognition language type, default is CHN_ENG Optional values include: -Auto_detect: automatically detect language and recognize -CHN_ENG: Chinese and English -ENG: English -JAP: Japanese -KOR: Korean -FRE: French -SPA: Spanish -POR: Portuguese -GER: German -ITA: Italian -RUS: Russian -DAN: Danish -DUT: Dutch -MAL: Malay -SWE: Swedish -IND: Indonesian -POL: Polish -ROM: Romanian -TUR: Turkish -GRE: Greek -HUN: Hungarian -THA: Thai -VIE: Vietnamese -ARA: Arabic -HIN: Hindi
eng_granularity	no	string	word/letter	It indicates whether the English single character result is output in word dimension or letter dimension when the recognition language type is "Chinese English (CHN_ENG)". It takes effect when recognize_grammar=small
recognize_granularity	no	string	big/small	Whether to locate the single character position, big: Do not locate the single character position, default value; Small: locate single character position
detect_direction	no	string	true/false	Whether to detect the orientation of the image, the default is not to detect, that is, false. Orientation means that the input image is in the normal direction and rotates 90/180/270 degrees counterclockwise. Optional values include: -True: detect orientation -False: direction not detected When inputting non positive pictures, it is recommended to set this parameter to "true" if you want to achieve a better recognition effect
vertexes_location	no	string	true/false	Whether to return the vertex position of the polygon surrounding the text. Single word position is not supported. Default is false
paragraph	no	string	true/false	Whether to output paragraph information
probability	no	string	true/false	Whether to return the confidence level of each line in the identification result
char_probability	no	string	true/false	Whether to return single character confidence. It is not returned by default. When recognize_granularity=small, the parameter is valid. Optional values include: -True: returns the single character confidence, -False: Do not return single character confidence

Request Code Example

Prompt 1 : Before using the sample code, remember to replace the sample token, image address or Base64 information.

Prompt 2 : Some languages depend on classes or libraries. Please check the download address in the code comment.

 curl -i -k ' https://aip.baidubce.com/rest/2.0/ocr/v1/accurate?access_token= [Call the token obtained from the authentication interface] ' --data 'image=[Picture Base64 encoding, UrlEncode required]' -H 'Content-Type:application/x-www-form-urlencoded'

 # encoding:utf-8

 import requests import base64 ''' Universal character recognition (including position version with high precision) ''' request_url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/accurate "
 #Open picture file in binary mode f =  open ( '[Local file]' ,  'rb' ) img = base64 . b64encode ( f . read ( ) ) params =  { "image" : img } access_token =  '[Token obtained by calling the authentication interface]' request_url = request_url +  "?access_token="  + access_token headers =  { 'content-type' :  'application/x-www-form-urlencoded' } response = requests . post ( request_url , data = params , headers = headers )
 if response :
     print  ( response . json ( ) )

 package  com . baidu . ai . aip ;

 import  com . baidu . ai . aip . utils . Base64Util ;
 import  com . baidu . ai . aip . utils . FileUtil ;
 import  com . baidu . ai . aip . utils . HttpUtil ;

 import  java . net . URLEncoder ;

 /** *Universal character recognition (including position version with high precision) */
 public  class  Accurate  {

     /** *Tool class required in important tip code *FileUtil, Base64Util, HttpUtil, GsonUtils *  https://ai.baidu.com/file/658A35ABAB2D404FBF903F64D47C1F72 *  https://ai.baidu.com/file/C8D81F3301E24D2892968F09AE1AD6E2 *  https://ai.baidu.com/file/544D677F5D4E4F17B4122FBD60DB82B3 *  https://ai.baidu.com/file/470B3ACCA3FE43788B5A963BF0B625F3 *Download */
     public  static  String  accurate ( )  {
         //Request url
         String url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/accurate " ;
         try  {
             //Local file path
             String filePath =  [Local file path] ;
             byte [ ] imgData =  FileUtil . readFileByBytes ( filePath ) ;
             String imgStr =  Base64Util . encode ( imgData ) ;
             String imgParam =  URLEncoder . encode ( imgStr ,  "UTF-8" ) ;

             String param =  "image="  + imgParam ;

             //Note that the purpose here is to simplify the encoding and obtain access_token for each request. The online environment access_token has an expiration time, and the client can cache it and retrieve it after expiration.
             String accessToken =  "[Token obtained by calling the authentication interface]" ;

             String result =  HttpUtil . post ( url , accessToken , param ) ;
             System . out . println ( result ) ;
             return result ;
         }  catch  ( Exception e )  { e . printStackTrace ( ) ;
         }
         return  null ;
     }

     public  static  void  main ( String [ ] args )  {
         Accurate . accurate ( ) ;
     }
 }

 # include  <iostream>
 # include  <curl/curl.h>

 //Download link of libcurl library: https://curl.haxx.se/download.html
 //Download link of jsoncpp library: https://github.com/open-source-parsers/jsoncpp/
 const  static std :: string request_url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/accurate " ;
 static std :: string accurate_result ;
 /** *The curl sends the callback function called by the http request. The returned body in json format is parsed in the callback function, and the parsing result is stored in the global static variable *See the libcurl document for @ param parameter definitions *@ return See the libcurl document for the definition of the return value */
 static size_t callback ( void  * ptr , size_t size , size_t nmemb ,  void  * stream )  {
     //The obtained body is stored in ptr and converted to string format first accurate_result = std :: string ( ( char  * ) ptr , size * nmemb ) ;
     return size * nmemb ;
 }
 /** *Universal character recognition (including position version with high precision) *@ return If the call is successful, 0 will be returned. If an error occurs, other error codes will be returned */
 int  accurate ( std :: string & json_result ,  const std :: string & access_token )  { std :: string url = request_url +  "?access_token="  + access_token ; CURL * curl =  NULL ; CURLcode result_code ;
     int is_success ; curl =  curl_easy_init ( ) ;
     if  ( curl )  {
         curl_easy_setopt ( curl , CURLOPT_URL , url . data ( ) ) ;
         curl_easy_setopt ( curl , CURLOPT_POST ,  one ) ; curl_httppost * post =  NULL ; curl_httppost * last =  NULL ;
         curl_formadd ( & post ,  & last , CURLFORM_COPYNAME ,  "image" , CURLFORM_COPYCONTENTS ,  "【base64_img】" , CURLFORM_END ) ;

         curl_easy_setopt ( curl , CURLOPT_HTTPPOST , post ) ;
         curl_easy_setopt ( curl , CURLOPT_WRITEFUNCTION , callback ) ; result_code =  curl_easy_perform ( curl ) ;
         if  ( result_code != CURLE_OK )  {
             fprintf ( stderr ,  "curl_easy_perform() failed: %s\n" ,
                     curl_easy_strerror ( result_code ) ) ; is_success =  one ;
             return is_success ;
         } json_result = accurate_result ;
         curl_easy_cleanup ( curl ) ; is_success =  zero ;
     }  else  {
         fprintf ( stderr ,  "curl_easy_init() failed." ) ; is_success =  one ;
     }
     return is_success ;
 }

 <? php
 /** *Initiate http post requests (REST APIs) and obtain the results of REST requests * @param string $url * @param string $param * @return - http response body if succeeds, else false. */
 function  request_post ( $url  =  '' ,  $param  =  '' )
 {
     if  ( empty ( $url )  ||  empty ( $param ) )  {
         return  false ;
     }

     $postUrl  =  $url ;
     $curlPost  =  $param ;
     //Initialize curl
     $curl  =  curl_init ( ) ;
     curl_setopt ( $curl ,  CURLOPT_URL ,  $postUrl ) ;
     curl_setopt ( $curl ,  CURLOPT_HEADER ,  zero ) ;
     //The result is required to be a string and output to the screen
     curl_setopt ( $curl ,  CURLOPT_RETURNTRANSFER ,  one ) ;
     curl_setopt ( $curl ,  CURLOPT_SSL_VERIFYPEER ,  false ) ;
     //Post submission method
     curl_setopt ( $curl ,  CURLOPT_POST ,  one ) ;
     curl_setopt ( $curl ,  CURLOPT_POSTFIELDS ,  $curlPost ) ;
     //Run curl
     $data  =  curl_exec ( $curl ) ;
     curl_close ( $curl ) ;

     return  $data ;
 }

 $token  =  '[Token obtained by calling the authentication interface]' ;
 $url  =  ' https://aip.baidubce.com/rest/2.0/ocr/v1/accurate?access_token= '  .  $token ;
 $img  =  file_get_contents ( '[Local file path]' ) ;
 $img  =  base64_encode ( $img ) ;
 $bodys  =  array (
     'image'  = >  $img
 ) ;
 $res  =  request_post ( $url ,  $bodys ) ;

 var_dump ( $res ) ;

 using System ;
 using System . IO ;
 using System . Net ;
 using System . Text ;
 using System . Web ;

 namespace com . baidu . ai {
     public  class  Accurate
     {
         //Universal character recognition (including position version with high precision)
         public  static  string  accurate ( )
         {
             string token =  "[Token obtained by calling the authentication interface]" ;
             string host =  " https://aip.baidubce.com/rest/2.0/ocr/v1/accurate?access_token= "  + token ;
             Encoding encoding = Encoding . Default ;
             HttpWebRequest request =  ( HttpWebRequest ) WebRequest . Create ( host ) ; request . Method =  "post" ; request . KeepAlive =  true ;
             //Base64 encoding of pictures
             string base64 =  getFileBase64 ( [Local picture file] ) ;
             String str =  "image="  + HttpUtility . UrlEncode ( base64 ) ;
             byte [ ] buffer = encoding . GetBytes ( str ) ; request . ContentLength = buffer . Length ; request . GetRequestStream ( ) . Write ( buffer ,  zero , buffer . Length ) ;
             HttpWebResponse response =  ( HttpWebResponse ) request . GetResponse ( ) ;
             StreamReader reader =  new  StreamReader ( response . GetResponseStream ( ) , Encoding . Default ) ;
             string result = reader . ReadToEnd ( ) ; Console . WriteLine ( "Universal character recognition (including location version):" ) ; Console . WriteLine ( result ) ;
             return result ;
         }

         public  static  String  getFileBase64 ( String fileName )  {
             FileStream filestream =  new  FileStream ( fileName , FileMode . Open ) ;
             byte [ ] arr =  new  byte [ filestream . Length ] ; filestream . Read ( arr ,  zero ,  ( int ) filestream . Length ) ;
             string baser64 = Convert . ToBase64String ( arr ) ; filestream . Close ( ) ;
             return baser64 ;
         }
     }
 }

Return description

Return parameters

field	Required	type	explain
log_id	yes	uint64	Unique log ID for problem location
direction	no	int32	Image direction. This field is returned when detect_direction=true. -- 1: Undefined, -0: Forward, -1: 90 degrees counterclockwise, -2: 180 degrees counterclockwise, -3: 270 degrees counterclockwise
words_result_num	yes	uint32	The number of recognition results, representing the number of words_result elements
words_result	yes	array[]	Identification result array
+ words	no	string	Identification result string
+ location	yes	array[]	Position array (coordinate 0 point is the upper left corner)
++ left	yes	uint32	The horizontal coordinate of the top left vertex of the rectangle representing the positioning position
++ top	yes	uint32	The vertical coordinate of the top left vertex of the rectangle representing the positioning position
++ width	yes	uint32	The width of the rectangle representing the positioning position
++ height	yes	uint32	The height of the rectangle representing the positioning position
+ chars	no	array[]	Single character result. This field is returned when recognize_granularity=small
++ char	no	string	Single character recognition result. This field is returned when recognize_grammar=small
++ char_prob	no	uint32	Single character confidence. This field is returned when recognize_granularity=small and char_probability=true
++ location	no	array[]	Position array (coordinate 0 is the upper left corner). This field is returned when recognize_granularity=small
+++ left	no	uint32	The horizontal coordinate of the top left vertex of the rectangle representing the positioning position. This field is returned when recognize_granularity=small
+++ top	no	uint32	The vertical coordinate of the top left vertex of the rectangle representing the positioning position. This field is returned when recognize_granularity=small
+++ width	no	uint32	Indicates the width of the rectangle at the location. This field is returned when recognize_granularity=small
+++ height	no	uint32	Indicates the height of the rectangle at the location. When recognize_granularity=small, this field is returned
+ probability	no	object	The confidence value of each line in the recognition result, including average: average value of line confidence, variance: variance of line confidence, min: minimum value of line confidence. This field is returned when probability=true
+ vertexes_location	no	array[]	The outsourced quadrilateral point coordinates of each line in the recognition result, when vertexes_location=true, this field is returned
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Vertical coordinate (coordinate 0 is the upper left corner)
+ finegrained_vertexes_location	no	array[]	The polygon contour point coordinates of each line in the recognition result, when vertexes_location=true, this field is returned
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Vertical coordinate (coordinate 0 is the upper left corner)
+ min_finegrained_vertexes_location	no	array[]	Represents the coordinates of the smallest outsourcing rectangular point corresponding to finegrained_poly_location. When vertexes_location=true, this field is returned
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Vertical coordinate (coordinate 0 is the upper left corner)
paragraphs_result_num	no	uint32	The number of recognition results, which represents the number of elements of paragraphs_result. When paragraph=true, this field is returned
paragraphs_result	no	array[]	Paragraph detection result. This field is returned when paragraph=true
+ words_result_idx	no	array[]	The row sequence number contained in a paragraph. This field is returned when paragraph=true
+ finegrained_vertexes_location	no	array[]	The polygon contour point coordinates of each line in the recognition result are returned when paragraph=true&&vertexes_location=true
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Vertical coordinate (coordinate 0 is the upper left corner)
+ min_finegrained_vertexes_location	no	array[]	The polygon contour point coordinates of each line in the recognition result are returned when paragraph=true&&vertexes_location=true
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Vertical coordinate (coordinate 0 is the upper left corner)
pdf_file_size	no	string	The total number of pages of the incoming PDF file. This field is returned when the pdf_file parameter is valid
ofd_file_size	no	string	The total number of pages of the incoming OFD file. This field is returned when the ofd_file parameter is valid

Return to Example

 {
     "log_id" :  1390584857033179136 ,
     "words_result_num" :  two
     "words_result" :  [
         {
             "words" :  " OCR" ,
             "location" :  {
                 "top" :  nineteen ,
                 "left" :  fifty-four ,
                 "width" :  one hundred and nineteen ,
                 "height" :  forty-six
             }
         } ,
         {
             "words" :  "Baidu Universal Character Recognition High Precision Edition" ,
             "location" :  {
                 "top" :  eighty-five ,
                 "left" :  fifty-four ,
                 "width" :  two hundred and six ,
                 "height" :  thirty-seven
             }
         }
     ] ,
 }

Universal character recognition (high-precision version)

Universal character recognition (standard version)