information community file
Technical capability
Voice technology
Character recognition
Face and Human Body
Image technology
Language and knowledge
video technique

Office document identification

Interface description

It supports layout analysis and character recognition of various office documents, outputs elements and location information such as drawings, tables, seals, and titles, and outputs text recognition results by layout. It can support 20+language types such as Chinese, English, Japanese, Korean, and French, and multiple scenarios such as printing, handwriting, and mixed layout.

Online debugging

You can visit Sample Code Center Debug the interface in , you can perform signature verification, view the request content and return results of online calls, and automatically generate sample code.

Request Description

Request Example

HTTP method: POST

Request URL: https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office

URL parameter:

parameter value
access_token Access_token obtained through API Key and Secret Key, refer to“ Access Token acquisition

The headers are as follows:

parameter value
Content-Type application/x-www-form-urlencoded

Place the request parameters in the body. The details of the parameters are as follows:

Request Parameters

parameter Required type Optional value range explain
image And url/pdf_file/ofd_file string - For image data, the size of urlencode after base64 encoding and urlencode shall not exceed 10M, the shortest side shall be at least 15px, the longest side shall be at most 8192px, and jpg/jpeg/png/bmp format is supported
priority : image>url>pdf_file>ofd_file. When the image field exists, the url, pdf_file, and ofd_file fields become invalid
url And image/pdf_file/ofd_file string - The image is a complete url with a length of no more than 1024 bytes. The size of the image corresponding to the url after encoding in base64 does not exceed 10M. The shortest side is at least 15px, and the longest side is at most 8192px. It supports the jpg/jpeg/png/bmp format
priority : image>url>pdf_file>ofd_file, when the image field exists, the url field is invalid
Please close the URL anti-theft chain
pdf_file And image/url/ofd_file string - PDF file, base64 encoding, the size after encoding is required not to exceed 10M, the shortest side is at least 15px, and the longest side is at most 8192px
priority : image>url>pdf_file>ofd_file. When image and url fields exist, the pdf_file field becomes invalid
pdf_file_num no string - The corresponding page number of the PDF file that needs to be identified. When the pdf_file parameter is valid, identify the corresponding page content of the incoming page number. If not, identify the first page by default
ofd_file And image/url/pdf_file string - For OFD files, urlencode is performed after base64 encoding. The size after base64 encoding and urlencode is required to be no more than 10M, the shortest side is at least 15px, and the longest side is at most 8192px
priority : image>url>pdf_file>ofd_file. When the image, url, and pdf_file fields exist, the ofd_file field becomes invalid
ofd_file_num no string - The corresponding page number of the OFD file to be identified. When the ofd_file parameter is valid, identify the corresponding page content of the incoming page number. If not, the default is to identify the first page
language_type no string auto_detect
CHN_ENG
ENG
JAP
KOR
FRE
SPA
POR
GER
ITA
RUS
DAN
DUT
MAL
SWE
IND
POL
ROM
TUR
GRE
HUN
THA
VIE
ARA
HIN
Recognition language type, default is CHN_ENG
Optional values include:
-Auto_detect: automatically detect language and recognize
-CHN_ENG: Chinese and English
-ENG: English
-JAP: Japanese
-KOR: Korean
-FRE: French
-SPA: Spanish
-POR: Portuguese
-GER: German
-ITA: Italian
-RUS: Russian
-DAN: Danish
-DUT: Dutch
-MAL: Malay
-SWE: Swedish
-IND: Indonesian
-POL: Polish
-ROM: Romanian
-TUR: Turkish
-GRE: Greek
-HUN: Hungarian
-THA: Thai
-VIE: Vietnamese
-ARA: Arabic
-HIN: Hindi
result_type no string big/small Whether the recognition result is returned as a single line result or a single word result, the default is big.
-Big: Return the line identification result
-Small: returns a single word result in addition to the line recognition result
char_probability no string true/false Whether to return the single character confidence. It is not returned by default. When result_type=small, the parameter is valid. Optional values include:
-True: returns the single character confidence
-False: Do not return single character confidence
detect_direction no string true/false Whether to detect the orientation of the image, the default is not to detect, that is, false. Orientation means that the input image is in the normal direction and rotates 90/180/270 degrees counterclockwise. Among them,
-0: Forward
-1: Rotate 90 degrees counterclockwise
-2: Rotate 180 degrees counterclockwise
-3: Rotate 270 degrees counterclockwise
line_probability no string true/false Whether to return the confidence level of recognition results of each line. Default is false
disp_line_poly no string true/false Whether to return the four corner coordinates of each line. Default is false
words_type no string handwring_only/ handprint_mix Text type.
Default: handwritten printing mixed pattern recognition
-Handwriting_only: handwritten character recognition
-Handprint_mix: handwritten printing mixed arrangement recognition
layout_analysis no string true/false Analyze document layout: including layout (figure, table, title, paragraph, directory, seal); Analysis output of attribute (column, header, footer, page number, footnote). Default is false
recg_tables no string true/false Whether to identify and output the table related information, including cell content. Default is false
recog_seal no string true/false Whether seal related information is recognized and output. Default is false
recg_formula no string true/false Whether to detect and identify the formula. The formula is returned in Latex format. Default is false
erase_seal no string true/false Whether to erase the watermark and seal before identifying the document. Default is false
disp_underline_analysis no string true/false Whether to recognize and output the underline, the default is false

Request Code Example

Prompt 1 : Before using the sample code, remember to replace the sample token, image address or Base64 information.

Prompt 2 : Some languages depend on classes or libraries. Please check the download address in the code comment.

 #Office document identification
 curl -i -k ' https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token= [Call the token obtained from the authentication interface] ' --data 'image=[Picture Base64 encoding, UrlEncode required]' -H 'Content-Type:application/x-www-form-urlencoded'
 # encoding:utf-8

 import requests import base64 ''' Office document identification ''' request_url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office "
 #Open picture file in binary mode f =  open ( '[Local file]' ,  'rb' ) img = base64 . b64encode ( f . read ( ) ) params =  { "image" : img } access_token =  '[Token obtained by calling the authentication interface]' request_url = request_url +  "?access_token="  + access_token headers =  { 'content-type' :  'application/x-www-form-urlencoded' } response = requests . post ( request_url , data = params , headers = headers )
 if response :
     print  ( response . json ( ) )
 package  com . baidu . ai . aip ;

 import  com . baidu . ai . aip . utils . Base64Util ;
 import  com . baidu . ai . aip . utils . FileUtil ;
 import  com . baidu . ai . aip . utils . HttpUtil ;

 import  java . net . URLEncoder ;

 /** *Office document identification */
 public  class  AnalysisOffice  {

     /** *Tool class required in important tip code *FileUtil, Base64Util, HttpUtil, GsonUtils *  https://ai.baidu.com/file/658A35ABAB2D404FBF903F64D47C1F72 *  https://ai.baidu.com/file/C8D81F3301E24D2892968F09AE1AD6E2 *  https://ai.baidu.com/file/544D677F5D4E4F17B4122FBD60DB82B3 *  https://ai.baidu.com/file/470B3ACCA3FE43788B5A963BF0B625F3 *Download */
     public  static  String  analysisOffice ( )  {
         //Request url
         String url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office " ;
         try  {
             //Local file path
             String filePath =  [Local file path] ;
             byte [ ] imgData =  FileUtil . readFileByBytes ( filePath ) ;
             String imgStr =  Base64Util . encode ( imgData ) ;
             String imgParam =  URLEncoder . encode ( imgStr ,  "UTF-8" ) ;

             String param =  "image="  + imgParam ;

             //Note that the purpose here is to simplify the encoding and obtain access_token for each request. The online environment access_token has an expiration time, and the client can cache it and retrieve it after expiration.
             String accessToken =  "[Token obtained by calling the authentication interface]" ;

             String result =  HttpUtil . post ( url , accessToken , param ) ;
             System . out . println ( result ) ;
             return result ;
         }  catch  ( Exception e )  { e . printStackTrace ( ) ;
         }
         return  null ;
     }

     public  static  void  main ( String [ ] args )  {
         AnalysisOffice . analysisOffice ( ) ;
     }
 }
 # include  <iostream>
 # include  <curl/curl.h>

 //Download link of libcurl library: https://curl.haxx.se/download.html
 //Download link of jsoncpp library: https://github.com/open-source-parsers/jsoncpp/
 const  static std :: string request_url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office " ;
 static std :: string analysisOffice_result ;
 /** *The curl sends the callback function called by the http request. The returned body in json format is parsed in the callback function, and the parsing result is stored in the global static variable *See the libcurl document for @ param parameter definitions *@ return See the libcurl document for the definition of the return value */
 static size_t callback ( void  * ptr , size_t size , size_t nmemb ,  void  * stream )  {
     //The obtained body is stored in ptr and converted to string format first analysisOffice_result = std :: string ( ( char  * ) ptr , size * nmemb ) ;
     return size * nmemb ;
 }
 /** *Office document identification *@ return If the call is successful, 0 will be returned. If an error occurs, other error codes will be returned */
 int  analysisOffice ( std :: string & json_result ,  const std :: string & access_token )  { std :: string url = request_url +  "?access_token="  + access_token ; CURL * curl =  NULL ; CURLcode result_code ;
     int is_success ; curl =  curl_easy_init ( ) ;
     if  ( curl )  {
         curl_easy_setopt ( curl , CURLOPT_URL , url . data ( ) ) ;
         curl_easy_setopt ( curl , CURLOPT_POST ,  one ) ; curl_httppost * post =  NULL ; curl_httppost * last =  NULL ;
         curl_formadd ( & post ,  & last , CURLFORM_COPYNAME ,  "image" , CURLFORM_COPYCONTENTS ,  "【base64_img】" , CURLFORM_END ) ;

         curl_easy_setopt ( curl , CURLOPT_HTTPPOST , post ) ;
         curl_easy_setopt ( curl , CURLOPT_WRITEFUNCTION , callback ) ; result_code =  curl_easy_perform ( curl ) ;
         if  ( result_code != CURLE_OK )  {
             fprintf ( stderr ,  "curl_easy_perform() failed: %s\n" ,
                     curl_easy_strerror ( result_code ) ) ; is_success =  one ;
             return is_success ;
         } json_result = analysisOffice_result ;
         curl_easy_cleanup ( curl ) ; is_success =  zero ;
     }  else  {
         fprintf ( stderr ,  "curl_easy_init() failed." ) ; is_success =  one ;
     }
     return is_success ;
 }
 <? php
 /** *Initiate http post requests (REST APIs) and obtain the results of REST requests * @param string $url * @param string $param * @return - http response body if succeeds, else false. */
 function  request_post ( $url  =  '' ,  $param  =  '' )
 {
     if  ( empty ( $url )  ||  empty ( $param ) )  {
         return  false ;
     }

     $postUrl  =  $url ;
     $curlPost  =  $param ;
     //Initialize curl
     $curl  =  curl_init ( ) ;
     curl_setopt ( $curl ,  CURLOPT_URL ,  $postUrl ) ;
     curl_setopt ( $curl ,  CURLOPT_HEADER ,  zero ) ;
     //The result is required to be a string and output to the screen
     curl_setopt ( $curl ,  CURLOPT_RETURNTRANSFER ,  one ) ;
     curl_setopt ( $curl ,  CURLOPT_SSL_VERIFYPEER ,  false ) ;
     //Post submission method
     curl_setopt ( $curl ,  CURLOPT_POST ,  one ) ;
     curl_setopt ( $curl ,  CURLOPT_POSTFIELDS ,  $curlPost ) ;
     //Run curl
     $data  =  curl_exec ( $curl ) ;
     curl_close ( $curl ) ;

     return  $data ;
 }

 $token  =  '[Token obtained by calling the authentication interface]' ;
 $url  =  ' https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token= '  .  $token ;
 $img  =  file_get_contents ( '[Local file path]' ) ;
 $img  =  base64_encode ( $img ) ;
 $bodys  =  array (
     'image'  = >  $img
 ) ;
 $res  =  request_post ( $url ,  $bodys ) ;

 var_dump ( $res ) ;
 using System ;
 using System . IO ;
 using System . Net ;
 using System . Text ;
 using System . Web ;

 namespace com . baidu . ai {
     public  class  AnalysisOffice
     {
         //Office document identification
         public  static  string  analysisOffice ( )
         {
             string token =  "[Token obtained by calling the authentication interface]" ;
             string host =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token= "  + token ;
             Encoding encoding = Encoding . Default ;
             HttpWebRequest request =  ( HttpWebRequest ) WebRequest . Create ( host ) ; request . Method =  "post" ; request . KeepAlive =  true ;
             //Base64 encoding of pictures
             string base64 =  getFileBase64 ( [Local picture file] ) ;
             String str =  "image="  + HttpUtility . UrlEncode ( base64 ) ;
             byte [ ] buffer = encoding . GetBytes ( str ) ; request . ContentLength = buffer . Length ; request . GetRequestStream ( ) . Write ( buffer ,  zero , buffer . Length ) ;
             HttpWebResponse response =  ( HttpWebResponse ) request . GetResponse ( ) ;
             StreamReader reader =  new  StreamReader ( response . GetResponseStream ( ) , Encoding . Default ) ;
             string result = reader . ReadToEnd ( ) ; Console . WriteLine ( "Office document identification:" ) ; Console . WriteLine ( result ) ;
             return result ;
         }

         public  static  String  getFileBase64 ( String fileName )  {
             FileStream filestream =  new  FileStream ( fileName , FileMode . Open ) ;
             byte [ ] arr =  new  byte [ filestream . Length ] ; filestream . Read ( arr ,  zero ,  ( int ) filestream . Length ) ;
             string baser64 = Convert . ToBase64String ( arr ) ; filestream . Close ( ) ;
             return baser64 ;
         }
     }
 }

Return description

Return parameters

field Required type explain
log_id yes uint64 Unique log ID for problem location
img_direction no int32 This field is returned when detect_direction=true. Detected image orientation, 0: positive direction; 1: Rotate 90 degrees counterclockwise; 2: Rotate 180 degrees counterclockwise; 3: Rotate 270 degrees counterclockwise
results_num yes uint32 Number of recognition results, representing the number of elements of results
results yes array[] Identify the result array. When recg_formula=true, return the result containing the formula
+ words_type yes string Text attributes (handwriting, printing), handwriting, printing
+ words yes array[] Recognition result array of the whole line
++ line_probability no array[] It is returned when line_probability=true. The confidence value of each line in the recognition result, including average: average value of line confidence, min: minimum value of line confidence
+++ average no float Row confidence
+++ min no float The lowest confidence level of a word in the whole line
++ word yes float Identification result of the whole line
++ poly_location no array[] Whether to return the coordinates of the four corners of each line, arranged clockwise from the top left corner, and returned when disp_line_poly=true
++ words_location yes array[] The rectangular box coordinates of the whole line. Position array (coordinate 0 point is the upper left corner)
+++ left yes uint32 The horizontal coordinate of the top left vertex of the rectangle representing the positioning position
+++ top yes uint32 The vertical coordinate of the top left vertex of the rectangle representing the positioning position
+++ width yes uint32 The width of the rectangle representing the positioning position
+++ height yes uint32 Height of rectangle representing position
+ chars no array[] Result_type=small. Single character result array
++ char no string Result_type=small. Content of each word
++ char_prob no uint32 It is returned when result_type=small and char_probability=true. Single character confidence
++ chars_location no array[] The rectangular box coordinates of each word. Position array (coordinate 0 point is the upper left corner)
+++ left no uint32 The horizontal coordinate of the top left vertex of the rectangle representing the positioning position
+++ top no uint32 The vertical coordinate of the top left vertex of the rectangle representing the positioning position
+++ width no uint32 The width of the rectangle representing the positioning position
+++ height no uint32 Height of rectangle representing position
underline no array[] The recognized underline recognition result is returned when disp_underline_analysis=true
+points no array[] Underline coordinate information
++end_x no uint32 X coordinate of underline end point
++end_y no uint32 Y coordinate of underline end point
++start_x no uint32 Underline starting point x coordinate
++start_y no uint32 Y coordinate of underline starting point
+prob no array[] Underline confidence, value range is between [0, 1]
layouts_num no uint32 The number of layout analysis results indicates the number of layout elements. Return when layout_analysis=true
layouts no array[] The document layout module array in each "column: section" contains 9 modules, including table, figure, paragraph text, paragraph title, table title, icon title, document title, directory, seal, etc; Coordinate position of each module; The row serial number id corresponding to the paragraph text and the text content in the table. Return when layout_analysis=true
+ layout no string Label results of layout analysis. Table: table, figure, text: text, paragraph title: title, contents: contents, seal: seal, table title: table_title, icon title: figure_title, document title: doc_title
+layout_prob no float Probability of current layout detection box
+ layout_location no array[] The position of the document layout information label, four vertices: top left, top right, bottom right, bottom left
++ x no uint32 Horizontal coordinate (coordinate 0 is the upper left corner)
++ y no uint32 Horizontal coordinate (coordinate 0 is the upper left corner)
+ layout_idx no array[] The position of the text in the document layout information in the results: if the row serial number ID corresponding to the layout text label is n, the text in this label will be displayed in the n+1 item in the results)
sec_rows no uint32 The "column: section" content in all layouts is represented as a grid of M x N, sec_rows=M. When layout_analysis=true, it returns
sec_cols no uint32 The "column" content in all layouts is represented as a grid of M x N, sec_cols=N. When layout_analysis=true, it returns
sections no array[] The five page attributes contained in a picture include: column, header, footer, page number, and footer. The array contains the attribute label, attribute location, and ID number of the text content contained in the attribute.
Among them, the section contains 9 modules, including table, figure, paragraph text, paragraph title, table title, icon title, document title, directory, and seal (output in the return parameter layouts). Return when layout_analysis=true
+ attribute no string The attribute tag result of layout analysis: column: section, header: header, footer: footer, page number: number, footnote: footnote.
+sections_prob no float The probability of detecting the box in the current layout
+ attri_location no object The location of layout analysis attributes, four vertices: top left, top right, bottom right, bottom left
++ x no uint32 Horizontal coordinate (coordinate 0 is the upper left corner)
++ y no uint32 Horizontal coordinate (coordinate 0 is the upper left corner)
+ sec_idx no object Sections returns the content serial number identification array contained in the five layout attributes in the parameter
++ idx no string Sections returns the serial number of the text line ID contained under each of the five layout attributes in the parameter
++ para_idx no string It will be returned only when attribute=section. Indicates the sequence number id returned by the 9 modules including table, figure, paragraph text, paragraph title, table title, icon title, document title, directory, seal, etc. in the "Column: section" of the return parameter (that is, the return sequence number of each module in the returned results of layouts)
++ row_idx no string It will be returned only when attribute=section. Indicates that all columns are represented as the grid of M xN, and the ID of the row of the grid to which they belong.
++ col_idx no string It will be returned only when attribute=section. Indicates that all columns are represented as the grid of M xN, and the ID of the column of the grid.
table_num yes int The number of detected tables, which is returned when recg_tables=true
tables_result yes array[] The content array of each table, which is returned when recg_tables=true
+table_location yes array[] Single table position, x, y coordinates of four corners
+header yes array[] Header information
++ location yes array[] Head position, x, y coordinates of four corners
++words yes string Header information, split by line
+body yes array[] Cell Information
++cell_location yes array[] The x, y coordinates of the four corners of the cell
++row_start yes array[] Cell row starting number, horizontal line starting from 0
++ row_end yes array[] Cell Row End Number
++ col_start yes array[] Cell column start numbering, vertical bar numbering starts from 0
++ col_end yes array[] Cell column termination number
++ words yes string Cell text content
++contents no array[] Text information in cells, displayed in rows
+++poly_location no array[] Position information of each line of text in the cell
+++ word no string Text content of each cell line
+footer yes array[] Footer information
++ location yes array[] Table tail position, x, y coordinates of four corners
++ words yes string Footer information, split by line
seal_recog_num no uint32 The number of seal results identified, which is returned when recog_seal=true
seal_recog_results no array[] Seal content array, returned when recog_seal=true
+location no object Seal position information (coordinate 0 point is the upper left corner)
++left no uint32 The horizontal coordinate of the top left vertex of the rectangle representing the positioning position
++top no uint32 The vertical coordinate of the top left vertex of the rectangle representing the positioning position
++width no uint32 The width of the rectangle representing the positioning position
++height no uint32 The height of the rectangle representing the positioning position
+probability no float Confidence value of each seal
+type no string Types of seals include circle, ellipse and rectangle
+major no object Main field information in the seal
++words no string Identification content of the main field, that is, the result of curved text in the upper ring of the chapter
++probability no float Confidence of main field identification content
+minor no array[] Other field information in the seal, that is, the identification content other than the main field is placed in this parameter and returned. If there are no other fields in the seal, this parameter is empty
++words no string Other field identification content
++probability no float Confidence of other field identification content
formula_result no array[] The recognized formula array, including the formula position and formula content, is returned when recg_formula=true
+ form_location no array[] Position information of formula, rectangular box coordinate array (coordinate 0 point is the upper left corner)
+ form_words no string The formula content information is returned in Latex format
pdf_file_size no string The total number of pages of the incoming PDF file. This field is returned when the pdf_file parameter is valid
ofd_file_size no string The total number of pages of the incoming OFD file. This field is returned when the ofd_file parameter is valid

Return to Example

 {
	 "results_num" :  five ,
	 "log_id" :  "1410491260247950412" ,
	 "results" :  [
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  eighty-eight ,
					 "left" :  four hundred and forty-two ,
					 "width" :  one hundred and forty-two ,
					 "height" :  forty-nine
				 } ,
				 "word" :  "Trip sheet"
			 }
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  two hundred and forty-one ,
					 "left" :  four hundred and thirty-nine ,
					 "width" :  three hundred and ninety-three ,
					 "height" :  thirty-seven
				 } ,
				 "word" :  "8 days and 7 nights for famous schools on the east coast of the United States"
			 }
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  three hundred and eighteen ,
					 "left" :  four hundred and thirty-six ,
					 "width" :  seven hundred and seventy-four ,
					 "height" :  thirty-one
				 } ,
				 "word" :  "The Capitol is located on the Capitol Hill, 25 meters high in Washington. It is the heart of the United States."
			 }
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  three hundred and seventy-four ,
					 "left" :  four hundred and thirty-four ,
					 "width" :  eight hundred and five ,
					 "height" :  thirty-one
				 } ,
				 "word" :  "On the big dome of the central attic stands a bronze statue of the Statue of Liberty 6 meters high."
			 }
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  four hundred and thirty-one ,
					 "left" :  four hundred and thirty-six ,
					 "width" :  five hundred and fifty-six ,
					 "height" :  thirty-one
				 } ,
				 "word" :  "The eastern lawn is where all previous presidents held their inaugurations."
			 }
		 }
	 ]
 }
Previous
Universal character recognition (including location version)
Next
Network picture text recognition