Test paper analysis and identification

Update time ： 2025-06-11

Interface description

It can analyze the document layout, output the positions of graphs, tables, titles, and texts, and output the OCR recognition results of the content in different sections. It supports Chinese and English languages, multiple scenarios of mixed handwriting and print, and formula recognition and handwritten vertical recognition.

Online debugging

You can visit Sample Code Center Debug the interface in , you can perform signature verification, view the request content and return results of online calls, and automatically generate sample code.

Request Description

Request Example

HTTP method: POST

Request URL: https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis

URL parameter:

parameter	value
access_token	Access_token obtained through API Key and Secret Key, refer to“ Access Token acquisition ”

The headers are as follows:

parameter	value
Content-Type	application/x-www-form-urlencoded

Place the request parameters in the body. The details of the parameters are as follows:

Request Parameters

parameter	Required	type	Optional value range	explain
image	And url/pdf_file	string	-	For image data, the size of urlencode after base64 encoding and urlencode shall not exceed 10M, the shortest side shall be at least 15px, the longest side shall be at most 8192px, and jpg/jpeg/png/bmp format is supported priority ：image > url > pdf_file， When the image field exists, the url and pdf_file fields become invalid
url	And image/pdf_file	string	-	The image is a complete url with a length of no more than 1024 bytes. The size of the image corresponding to the url after encoding in base64 does not exceed 10M. The shortest side is at least 15px, and the longest side is at most 8192px. It supports the jpg/jpeg/png/bmp format priority ：image > url > pdf_file， When the image field exists, the url field is invalid Please close the URL anti-theft chain
pdf_file	And image/url	string	-	PDF files are urlencoded after base64 encoding. It is required that the size of base64 encoding and urlencoded files should not exceed 10M, the shortest side should be at least 15px, and the longest side should be at most 8192px priority ：image > url > pdf_file， When the image and url fields exist, the pdf_file field becomes invalid
pdf_file_num	no	string	-	The corresponding page number of the PDF file that needs to be identified. When the pdf_file parameter is valid, identify the corresponding page content of the incoming page number. If not, identify the first page by default
language_type	no	string	CHN_ENG/ ENG	Recognition language type, default is CHN_ENG Optional values include: =CHN_ENG: Chinese and English =ENG: English
result_type	no	string	big/small	Whether the recognition result is returned as a single line result or a single word result, the default is big. =Big: Return the line identification result =Small: returns a single word result in addition to the line recognition result
detect_direction	no	string	true/false	Whether to detect the orientation of the image, the default is not to detect, that is, false. Orientation means that the input image is in the normal direction and rotates 90/180/270 degrees counterclockwise. Among them, 0: Forward 1: Rotate 90 degrees counterclockwise 2: Rotate 180 degrees counterclockwise 3: Rotate 270 degrees counterclockwise
line_probability	no	string	true/false	Whether to return the confidence level of recognition results of each line. Default is false
disp_line_poly	no	string	true/false	Whether to return the four corner coordinates of each line. Default is false
words_type	no	string	handwring_only/ handprint_mix	Text type. Default: printed text recognition =Handwriting_only: handwritten character recognition =Handprint_mix: handwritten printing mixed arrangement recognition
layout_analysis	no	string	true/false	Analyze document layout: including layout (figure, table, title, paragraph, and table of contents); Analysis output of attribute (column, header, footer, page number, footnote)
recg_formula	no	string	true/false	Whether to detect and recognize the formula. The default value is false. The formula is returned in Latex format text. =True: Detect and recognize formulas =False: Do not detect recognition formula
recg_long_division	no	string	true/false	Whether to detect and recognize the handwriting vertical type. The default value is false. =True: Detect and recognize handwriting vertical =False: Do not detect handwriting vertical
disp_underline_analysis	no	string	true/false	Whether to enable the underline recognition function, the optional values are as follows: =True: Enable, output underline information in the return parameter underline =False: off, default value, no underline information is output
recg_alter	no	string	true/false	Whether to enable the function of returning the result of the identification of the alteration, the optional values are as follows: =True: Enable detection, and use "☰" to return the altered part =False: off, the default value, does not output the correction recognition result

Request Code Example

Prompt 1 : Before using the sample code, remember to replace the sample token, image address or Base64 information.

Prompt 2 : Some languages depend on classes or libraries. Please check the download address in the code comment.

 #Test paper analysis and identification
 curl -i -k ' https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis?access_token= [Call the token obtained from the authentication interface] ' --data 'language_type=CHN_ENG&result_type=big&image=[Picture Base64 encoding, UrlEncode required]' -H 'Content-Type:application/x-www-form-urlencoded'

 # encoding:utf-8

 import requests import base64 ''' Test paper analysis and identification ''' request_url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis "
 #Open picture file in binary mode f =  open ( '[Local file]' ,  'rb' ) img = base64 . b64encode ( f . read ( ) ) params =  { "image" : img , "language_type" : "CHN_ENG" , "result_type" : "big" } access_token =  '[Token obtained by calling the authentication interface]' request_url = request_url +  "?access_token="  + access_token headers =  { 'content-type' :  'application/x-www-form-urlencoded' } response = requests . post ( request_url , data = params , headers = headers )
 if response :
     print  ( response . json ( ) )

 package  com . baidu . ai . aip ;

 import  com . baidu . ai . aip . utils . Base64Util ;
 import  com . baidu . ai . aip . utils . FileUtil ;
 import  com . baidu . ai . aip . utils . HttpUtil ;

 import  java . net . URLEncoder ;

 /** *Document layout analysis and recognition */
 public  class  DocAnalysis  {

     /** *Tool class required in important tip code *FileUtil, Base64Util, HttpUtil, GsonUtils *  https://ai.baidu.com/file/658A35ABAB2D404FBF903F64D47C1F72 *  https://ai.baidu.com/file/C8D81F3301E24D2892968F09AE1AD6E2 *  https://ai.baidu.com/file/544D677F5D4E4F17B4122FBD60DB82B3 *  https://ai.baidu.com/file/470B3ACCA3FE43788B5A963BF0B625F3 *Download */
     public  static  String  docAnalysis ( )  {
         //Request url
         String url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis " ;
         try  {
             //Local file path
             String filePath =  [Local file path] ;
             byte [ ] imgData =  FileUtil . readFileByBytes ( filePath ) ;
             String imgStr =  Base64Util . encode ( imgData ) ;
             String imgParam =  URLEncoder . encode ( imgStr ,  "UTF-8" ) ;

             String param =  "language_type="  +  "CHN_ENG"  +  "&result_type="  +  "big"  +  "&image="  + imgParam ;

             //Note that the purpose here is to simplify the encoding and obtain access_token for each request. The online environment access_token has an expiration time, and the client can cache it and retrieve it after expiration.
             String accessToken =  "[Token obtained by calling the authentication interface]" ;

             String result =  HttpUtil . post ( url , accessToken , param ) ;
             System . out . println ( result ) ;
             return result ;
         }  catch  ( Exception e )  { e . printStackTrace ( ) ;
         }
         return  null ;
     }

     public  static  void  main ( String [ ] args )  {
         DocAnalysis . docAnalysis ( ) ;
     }
 }

 # include  <iostream>
 # include  <curl/curl.h>

 //Download link of libcurl library: https://curl.haxx.se/download.html
 //Download link of jsoncpp library: https://github.com/open-source-parsers/jsoncpp/
 const  static std :: string request_url =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis " ;
 static std :: string docAnalysis_result ;
 /** *The curl sends the callback function called by the http request. The returned body in json format is parsed in the callback function, and the parsing result is stored in the global static variable *See the libcurl document for @ param parameter definitions *@ return See the libcurl document for the definition of the return value */
 static size_t callback ( void  * ptr , size_t size , size_t nmemb ,  void  * stream )  {
     //The obtained body is stored in ptr and converted to string format first docAnalysis_result = std :: string ( ( char  * ) ptr , size * nmemb ) ;
     return size * nmemb ;
 }
 /** *Document layout analysis and recognition *@ return If the call is successful, 0 will be returned. If an error occurs, other error codes will be returned */
 int  docAnalysis ( std :: string & json_result ,  const std :: string & access_token )  { std :: string url = request_url +  "?access_token="  + access_token ; CURL * curl =  NULL ; CURLcode result_code ;
     int is_success ; curl =  curl_easy_init ( ) ;
     if  ( curl )  {
         curl_easy_setopt ( curl , CURLOPT_URL , url . data ( ) ) ;
         curl_easy_setopt ( curl , CURLOPT_POST ,  one ) ; curl_httppost * post =  NULL ; curl_httppost * last =  NULL ;
         curl_formadd ( & post ,  & last , CURLFORM_COPYNAME ,  "language_type" , CURLFORM_COPYCONTENTS ,  "CHN_ENG" , CURLFORM_END ) ;
         curl_formadd ( & post ,  & last , CURLFORM_COPYNAME ,  "result_type" , CURLFORM_COPYCONTENTS ,  "big" , CURLFORM_END ) ;
         curl_formadd ( & post ,  & last , CURLFORM_COPYNAME ,  "image" , CURLFORM_COPYCONTENTS ,  "【base64_img】" , CURLFORM_END ) ;

         curl_easy_setopt ( curl , CURLOPT_HTTPPOST , post ) ;
         curl_easy_setopt ( curl , CURLOPT_WRITEFUNCTION , callback ) ; result_code =  curl_easy_perform ( curl ) ;
         if  ( result_code != CURLE_OK )  {
             fprintf ( stderr ,  "curl_easy_perform() failed: %s\n" ,
                     curl_easy_strerror ( result_code ) ) ; is_success =  one ;
             return is_success ;
         } json_result = docAnalysis_result ;
         curl_easy_cleanup ( curl ) ; is_success =  zero ;
     }  else  {
         fprintf ( stderr ,  "curl_easy_init() failed." ) ; is_success =  one ;
     }
     return is_success ;
 }

 <?php
 /** *Initiate http post requests (REST APIs) and obtain the results of REST requests * @param string $url * @param string $param * @return - http response body if succeeds, else false. */
 function  request_post ( $url  =  '' ,  $param  =  '' )
 {
     if  ( empty ( $url )  ||  empty ( $param ) )  {
         return  false ;
     }

     $postUrl  =  $url ;
     $curlPost  =  $param ;
     //Initialize curl
     $curl  =  curl_init ( ) ;
     curl_setopt ( $curl ,  CURLOPT_URL ,  $postUrl ) ;
     curl_setopt ( $curl ,  CURLOPT_HEADER ,  zero ) ;
     //The result is required to be a string and output to the screen
     curl_setopt ( $curl ,  CURLOPT_RETURNTRANSFER ,  one ) ;
     curl_setopt ( $curl ,  CURLOPT_SSL_VERIFYPEER ,  false ) ;
     //Post submission method
     curl_setopt ( $curl ,  CURLOPT_POST ,  one ) ;
     curl_setopt ( $curl ,  CURLOPT_POSTFIELDS ,  $curlPost ) ;
     //Run curl
     $data  =  curl_exec ( $curl ) ;
     curl_close ( $curl ) ;

     return  $data ;
 }

 $token  =  '[Token obtained by calling the authentication interface]' ;
 $url  =  ' https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis?access_token= '  .  $token ;
 $img  =  file_get_contents ( '[Local file path]' ) ;
 $img  =  base64_encode ( $img ) ;
 $bodys  =  array (
     'language_type'  = >  "CHN_ENG" ,
     'result_type'  = >  "big" ,
     'image'  = >  $img
 ) ;
 $res  =  request_post ( $url ,  $bodys ) ;

 var_dump ( $res ) ;

 using System ;
 using System . IO ;
 using System . Net ;
 using System . Text ;
 using System . Web ;

 namespace com . baidu . ai {
     public  class  DocAnalysis
     {
         //Document layout analysis and recognition
         public  static  string  docAnalysis ( )
         {
             string token =  "[Token obtained by calling the authentication interface]" ;
             string host =  " https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis?access_token= "  + token ;
             Encoding encoding = Encoding . Default ;
             HttpWebRequest request =  ( HttpWebRequest ) WebRequest . Create ( host ) ; request . Method =  "post" ; request . KeepAlive =  true ;
             //Base64 encoding of pictures
             string base64 =  getFileBase64 ( [Local picture file] ) ;
             String str =  "language_type="  +  "CHN_ENG"  +  "&result_type="  +  "big"  +  "&image="  + HttpUtility . UrlEncode ( base64 ) ;
             byte [ ] buffer = encoding . GetBytes ( str ) ; request . ContentLength = buffer . Length ; request . GetRequestStream ( ) . Write ( buffer ,  zero , buffer . Length ) ;
             HttpWebResponse response =  ( HttpWebResponse ) request . GetResponse ( ) ;
             StreamReader reader =  new  StreamReader ( response . GetResponseStream ( ) , Encoding . Default ) ;
             string result = reader . ReadToEnd ( ) ; Console . WriteLine ( "Document layout analysis and recognition:" ) ; Console . WriteLine ( result ) ;
             return result ;
         }

         public  static  String  getFileBase64 ( String fileName )  {
             FileStream filestream =  new  FileStream ( fileName , FileMode . Open ) ;
             byte [ ] arr =  new  byte [ filestream . Length ] ; filestream . Read ( arr ,  zero ,  ( int ) filestream . Length ) ;
             string baser64 = Convert . ToBase64String ( arr ) ; filestream . Close ( ) ;
             return baser64 ;
         }
     }
 }

Return description

Return parameters

field	Required	type	explain
log_id	yes	uint64	Unique log ID for problem location
img_direction	no	int32	Detect_direction=true. Detected image orientation, 0: positive direction; 1: Rotate 90 degrees counterclockwise; 2: Rotate 180 degrees counterclockwise; 3: Rotate 270 degrees counterclockwise
results_num	yes	uint32	Number of recognition results, representing the number of elements of results
results	yes	array[]	Identification result array
+ words_type	yes	string	Text attributes (handwriting, printing), handwriting, printing
+ words	yes	array[]	The recognition result array of the whole line.
++ line_probability	no	array[]	It is returned when line_probability=true. The confidence value of each line in the recognition result, including average: average value of line confidence, min: minimum value of line confidence
+++ average	no	float	Row confidence
+++ min	no	float	The lowest confidence level of a word in the whole line
++ word	yes	string	Identification result of the whole line
++ poly_location	no	array[]	Whether to return the coordinates of the four corners of each line, when disp_line_poly=true
++ words_location	yes	array[]	The rectangular box coordinates of the whole line. Location information (coordinate 0 is the upper left corner)
+++ left	yes	uint32	The horizontal coordinate of the top left vertex of the rectangle representing the positioning position
+++ top	yes	uint32	The vertical coordinate of the top left vertex of the rectangle representing the positioning position
+++ width	yes	uint32	The width of the rectangle representing the positioning position
+++ height	yes	uint32	Height of rectangle representing position
+ chars	no	array[]	Result_type=small. Single character result array
++ char	no	string	Result_type=small. Content of each word
++ chars_location	no	object	The rectangular box coordinates of each word. Location information (coordinate 0 is the upper left corner)
+++ left	no	uint32	The horizontal coordinate of the top left vertex of the rectangle representing the positioning position
+++ top	no	uint32	The vertical coordinate of the top left vertex of the rectangle representing the positioning position
+++ width	no	uint32	The width of the rectangle representing the positioning position
+++ height	no	uint32	Height of rectangle representing position
formula_result	no	array[]	Identify the formula array in the result, including the formula location and formula content, When recg_formal=true
+ form_location	no	array[]	The rectangular box coordinate array of the formula in the recognition result (coordinate 0 point is the upper left corner)
+ form_words	no	string	Identify the content of the formula in the result
words_result	no	array[]	The recognition result array after the fusion of ordinary text and formula, When recg_formal=true
+ location	no	array[]	The rectangular box coordinate array of the whole line in the recognition result (coordinate 0 point is the upper left corner)
+ words	no	string	Identify the contents of the whole line in the result
+ chars	no	array[]	Single character result array. The formula as a whole is a single word, Result_type=small
++ char	no	string	Content of each word
++ chars_location	no	object	Rectangular box coordinate array of each word (coordinate 0 point is the upper left corner)
layouts_num	no	uint32	Number of layout analysis results, representing the number of layout elements
layouts	no	array[]	The document layout module array in each "column: section" contains five modules, including table, figure, paragraph text, title, and directory; Coordinate position of each module; The row serial number id corresponding to the paragraph text and the text content in the table.
+ layout	no	string	Label results of layout analysis. Table: table, figure, text, title, contents
+ layout_location	no	array[]	The position of the document layout information label, four vertices: top left, top right, bottom right, bottom left
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
+ layout_idx	no	array[]	The position of the text in the document layout information in the results: if the row serial number ID corresponding to the layout text label is n, the text in this label will be displayed in the n+1 item in the results)
sec_rows	no	uint32	The "column: section" content in all layouts is represented as a grid of M x N, sec_rows = M
sec_cols	no	uint32	The "column" content in all layouts is represented as an M x N grid, sec_cols = N
sections	no	array[]	The five page attributes contained in a picture include: column, header, footer, page number, and footer. The array contains the attribute label, attribute location, and ID number of the text content contained in the attribute. Among them, the section contains five module contents, including table, figure, paragraph text, title and directory (output in the return parameter layouts)
+ attribute	no	string	Attribute label results of layout analysis, column: section, header: header, footer: footer, page number: number, footnote: footnote
+ attri_location	no	array[]	The location of layout analysis attributes, four vertices: top left, top right, bottom right, bottom left
++ x	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
++ y	no	uint32	Horizontal coordinate (coordinate 0 is the upper left corner)
+ sec_idx	no	string	Sections returns the serial number identification of the contents contained in the five layout attributes in the parameter
++ idx	no	string	Sections returns the serial number of the text line ID contained under each of the five layout attributes in the parameter
++ para_idx	no	string	It will be returned only when attribute=section. Indicates the sequence number id returned by the five modules including table, figure, paragraph text, title and directory in the "Column: section" of the return parameter (that is, the return sequence number of each module in the returned results of layouts)
++ row_idx	no	string	It will be returned only when attribute=section. Indicates that all columns are represented as M xN grids, and the ID of the grid row
++ col_idx	no	string	It will be returned only when attribute=section. Indicates that all columns are represented as M xN grids, and the column ID of the grid
+ long_division	no	array[]	Handwritten vertical recognition result, returned when recg_long_division=true
+ location	no	object	Handwritten vertical rectangular box coordinate array (coordinate 0 point is the upper left corner)
+ words	no	object	Output handwritten vertical inner text results by line
++ word	no	string	Content of each line of text
++ words_location	no	object	Rectangular box coordinate array of each line (coordinate 0 point is the upper left corner)
+ long_division_num	no	uint32	The number of handwritten vertical recognition results, representing the number of elements of long_division, returned when recg_long_division=true
underline	no	array[]	The recognized underline result is returned when disp_underline_analysis=true
+ points	no	object	Underline coordinate information
++ start_x	no	uint32	Underline starting point x coordinate
++ start_y	no	uint32	Y coordinate of underline starting point
++ end_x	no	uint32	X coordinate of underline end point
++ end_y	no	uint32	Y coordinate of underline end point
+ prob	no	uint32	Underline confidence, value range is between [0, 1]
pdf_file_size	no	string	The total number of pages of the incoming PDF file. This field is returned when the pdf_file parameter is valid

Return to Example

 {
	 "results_num" :  six ,
	 "log_id" :  "4488766695474114139" ,
	 "img_direction" :  zero ,
	 "layouts_num" :  zero ,
	 "results" :  [
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  one hundred and twenty-four ,
					 "left" :  one hundred and thirty-six ,
					 "width" :  four hundred and eighteen ,
					 "height" :  sixty-five
				 } ,
				 "word" :  "Five dictations (4 points)"
			 } ,
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  two hundred and forty-six ,
					 "left" :  one hundred and thirty-six ,
					 "width" :  thirty-seven ,
					 "height" :  forty-five
				 } ,
				 "word" :  "1"
			 } ,
		 } ,
		 {
			 "words_type" :  "handwriting" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  one hundred and ninety-five ,
					 "left" :  two hundred and thirty-seven ,
					 "width" :  four hundred and sixty-nine ,
					 "height" :  one hundred and four
				 } ,
				 "word" :  "Picking chrysanthemums under the east fence"
			 } ,
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  two hundred and forty-one ,
					 "left" :  eight hundred and eighty-nine ,
					 "width" :  two hundred and eighty-seven ,
					 "height" :  fifty-two
				 } ,
				 "word" :  "See Nanshan leisurely?"
			 } ,
		 } ,
		 {
			 "words_type" :  "print" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  four hundred and fifteen ,
					 "left" :  one hundred and thirty-four ,
					 "width" :  four hundred and seventy-two ,
					 "height" :  fifty-two
				 } ,
				 "word" :  "2. Businesswomen don't know the hatred of national subjugation"
			 } ,
		 } ,
		 {
			 "words_type" :  "handwriting" ,
			 "words" :  {
				 "words_location" :  {
					 "top" :  three hundred and seventy-seven ,
					 "left" :  six hundred and seven ,
					 "width" :  five hundred and fifty-six ,
					 "height" :  ninety-three
				 } ,
				 "word" :  "Across the river you still sing backyard flowers."
			 } ,
		 } ,
	 ] ,
   "formula_result" :  [
         {
             "form_location" :  {
                 "top" :  zero ,
                 "left" :  ninety-seven ,
                 "width" :  one hundred and fifty-one ,
                 "height" :  seventy-seven
             } ,
             "form_words" :  " x = \\frac { 1 } { n - 1 } - 1 1 \\frac { \\frac { 5 } { 2 } } { 5 }"
         } ,
         {
             "form_location" :  {
                 "top" :  one hundred and nineteen ,
                 "left" :  one hundred and eighteen ,
                 "width" :  one hundred and fifteen ,
                 "height" :  eighty
             } ,
             "form_words" :  " = \\sqrt { \\frac { x } { 2 } ( x - 1 ) ^ { 2 } }"
         } ,
         {
             "form_location" :  {
                 "top" :  one hundred and ninety-six ,
                 "left" :  seventy-eight ,
                 "width" :  seventeen ,
                 "height" :  twenty-four
             } ,
             "form_words" :  " x ^ { 2 }"
         } ,
         {
             "form_location" :  {
                 "top" :  two hundred and forty-four ,
                 "left" :  seventy-nine ,
                 "width" :  one hundred and three ,
                 "height" :  seventy
             } ,
             "form_words" :  " s = \\frac { \\sum _ { i = 0 } { m } \\cdot i v } { - 1 }"
         }
     ] ,
     "words_result" :  [
         {
             "location" :  {
                 "top" :  one hundred and sixty-four ,
                 "left" :  two hundred and fifty-five ,
                 "width" :  one hundred and eleven ,
                 "height" :  sixteen
             } ,
             "words" :  "Where m represents the examinee"
         } ,
         {
             "location" :  {
                 "top" :  one hundred and ninety-eight ,
                 "left" :  twenty-four ,
                 "width" :  three hundred and forty-one ,
                 "height" :  eighteen
             } ,
             "words" :  "The number of people x ^ {2} represents the equal score of the first question in the exam."
         } ,
     ] ,
 }

Character recognition of medical bills

Formula recognition