brief introduction
Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful when it analyzes data and generates reports. In simple terms, awk reads the file line by line, slices each line with blank as the default separator, and then performs various analysis and processing on the cut part.
There are three different versions of awk: awk, nawk and gawk. Without special instructions, gawk generally refers to gawk, which is the GNU version of AWK.
Awk derives its name from the first letter of the surnames of its founders, Alfred Aho, Peter Weinberger and Brian Kernighan. In fact, AWK does have its own language: AWK programming language, which has been officially defined as "style scanning and processing language" by the three creators. It allows you to create short programs that read input files, sort data, process data, perform calculations on input, generate reports, and countless other functions.
usage method
awk '{pattern + action}' {filenames}
Although the operation may be complex, the syntax is always like this, where pattern represents what AWK looks for in the data, and action is a series of commands that are executed when matching content is found. The curly braces ({}) do not need to always appear in the program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression to be expressed, enclosed by a slash.
The basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be performed. A complete awk script is usually used to format information in a text file.
Generally, awk is a file processing unit. Awk receives one line of the file and then executes the corresponding command to process the text.
Call awk
There are three ways to call awk
one Command line mode
awk [ - F field - separator ] 'commands' input - file ( s )
Among them, commands Is true awk Command, [ - F field separator ] Is optional. input - file ( s ) Is the file to be processed.
stay awk In each line of the file, each item separated by the domain separator is called a domain. Usually, when you do not name - F In the case of field separator, the default field separator is space.
2.shell Script mode
Put all awk The command inserts a file and causes the awk The program can be executed, and then awk As the first line of the script, the command interpreter is called by typing the name of the script.
amount to shell In the first line of the script: #!/ bin/sh
It can be replaced by: #!/ bin/awk
three Put all awk The command inserts a separate file, and then calls:
awk - f awk - script - file input - file ( s )
Among them, - f Option loading awk - script - file In awk script, input - file ( s ) It's the same as the one above.
This chapter focuses on the command line mode.
Getting Started Example
Assume that the output of last - n 5 is as follows
#Last - n 5 Only the first five lines are taken out
root pts / one 192.168.1.100 Tue Feb ten eleven : twenty-one still logged in
root pts / one 192.168.1.100 Tue Feb ten 00 : forty-six - 02 : twenty-eight ( 01 : forty-one )
root pts / one 192.168.1.100 Mon Feb nine eleven : forty-one - eighteen : thirty ( 06 : forty-eight )
dmtsai pts / one 192.168.1.100 Mon Feb nine eleven : forty-one - eleven : forty-one ( 00 : 00 )
root tty1 Fri Sep five fourteen : 09 - fourteen : ten ( 00 : 01 )
If only the 5 most recently logged in accounts are displayed
#last -n 5 | awk '{print $1}'
root
root
root
dmtsai
root
The awk workflow is as follows: read in a record separated by a 'n' newline character, divide the record into fields according to the specified field separator, fill in fields, $0 represents all fields, $1 represents the first field, and $n represents the nth field. The default domain separator is "blank key" or "key", so $1 represents the login user, $3 represents the login user IP, and so on.
If only the account of/etc/passwd is displayed
#cat /etc/passwd |awk -F ':' '{print $1}'
root
daemon
bin
sys
This is an example of awk+action. Each line will execute action {print $1}.
-F Specifies that the field separator is': '.
If only the account of/etc/passwd and the shell corresponding to the account are displayed, and the account and shell are separated by the tab key
#cat /etc/passwd |awk -F ':' '{print $1"t"$7}'
root / bin / bash
daemon / bin / sh
bin / bin / sh
sys / bin / sh
If only the account of/etc/passwd and the corresponding shell of the account are displayed, and the account and shell are separated by commas, the column name name and shell are added to all lines, and "blue,/bin/nose" is added to the last line.
cat / etc / passwd | awk - F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
name , shell
root , / bin / bash
daemon , / bin / sh
bin , / bin / sh
sys , / bin / sh
....
blue , / bin / nosh
The awk workflow is as follows: first execute BEGING, then read the file, read in a record separated by the/n newline character, then divide the record into fields according to the specified field separator, fill in the fields, $0 represents all fields, $1 represents the first field, $n represents the nth field, and then start executing the action corresponding to the mode. Then start to read the second record ········ until all records are read, and finally execute the END operation.
Search all lines with the root keyword in/etc/passwd
#awk -F: '/root/' /etc/passwd
root : x : zero : zero : root :/ root :/ bin / bash
This is an example of the use of pattern. Only the lines that match the pattern (root here) can execute the action (no action is specified, and the content of each line is output by default).
Regular search is supported, such as awk - F: '/^ root/'/etc/passwd that starts with root
Search all lines with the root keyword in/etc/passwd, and display the corresponding shell
# awk -F: '/root/{print $7}' /etc/passwd
/ bin / bash
Action {print $7} is specified here
Awk built-in variable
Awk has many built-in variables to set environment information. These variables can be changed. Here are some of the most commonly used variables.
ARGC Number of command line parameters
ARGV Command line parameter arrangement
ENVIRON Support the use of system environment variables in queues
FILENAME awk File name browsed
FNR Number of records browsing files
FS Set the input field separator, equivalent to the command line - F option
NF Number of fields for browsing records
NR Number of records read
OFS Output field separator
ORS Output Record Separator
RS record separator
In addition, the $0 variable refers to the entire record$ 1 represents the first field of the current line, $2 represents the second field of the current line,... and so on
Statistics of/etc/passwd: file name, line number of each line, number of columns in each line, and corresponding complete line contents:
#awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
filename :/ etc / passwd , linenumber : one , columns : seven , linecontent : root : x : zero : zero : root :/ root :/ bin / bash
filename :/ etc / passwd , linenumber : two , columns : seven , linecontent : daemon : x : one : one : daemon :/ usr / sbin :/ bin / sh
filename :/ etc / passwd , linenumber : three , columns : seven , linecontent : bin : x : two : two : bin :/ bin :/ bin / sh
filename :/ etc / passwd , linenumber : four , columns : seven , linecontent : sys : x : three : three : sys :/ dev :/ bin / sh
Using printf instead of print can make the code more concise and readable
awk -F ':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%sn",FILENAME,NR,NF,$0)}' /etc/passwd
Print and printf
Both print and printf print output functions are provided in awk.
The parameters of the print function can be variables, numeric values or strings. The string must be quoted in double quotation marks and the parameters separated by commas. If there is no comma, the parameters are concatenated and cannot be distinguished. Here, the comma is the same as the separator of the output file, except that the latter is a space.
The usage of printf function is basically similar to that of printf in c language. It can format strings. When the output is complex, printf is easier to use and the code is easier to understand.
Awk programming
Variables and Assignment
In addition to awk's built-in variables, awk can also customize variables.
The following statistics show the number of accounts in/etc/passwd
awk '{count++;print $0;} END{print "user count is ", count}' / etc / passwd
root : x : zero : zero : root :/ root :/ bin / bash
......
user count is forty
Count is a user-defined variable. In the previous action {}, there is only one print. In fact, print is only one statement, while action {} can have multiple statements; No.
The count is not initialized here. Although the default value is 0, it is appropriate to initialize it to 0:
awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' / etc / passwd
[ start ] user count is zero
root : x : zero : zero : root :/ root :/ bin / bash
...
[ end ] user count is forty
Count the number of bytes occupied by files in a folder
ls - l | awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'
[ end ] size is eight million six hundred and fifty-seven thousand one hundred and ninety-eight
If displayed in M:
ls - l | awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}'
[ end ] size is eight point two five eight eight nine M
Note that the statistics do not include subdirectories of folders.
Conditional statement
The conditional statements in awk are borrowed from the C language. See the following declaration method:
if ( expression ) {
statement ;
statement ;
... ...
}
if ( expression ) {
statement ;
} else {
statement2 ;
}
if ( expression ) {
statement1 ;
} else if ( expression1 ) {
statement2 ;
} else {
statement3 ;
}
Count the number of bytes occupied by files in a folder and filter 4096 files (usually folders):
ls - l | awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}'
[ end ] size is eight point two two three three nine M
Loop statement
The loop statements in awk are also borrowed from C language, and support while, do/while, for, break, and continue. The semantics of these keywords are identical to those in C language.
array
Because the subscripts of arrays in awk can be numbers and letters, the subscripts of arrays are usually called keys. Values and keywords are stored in an internal hash table for key/value applications. Since hashes are not stored in order, when displaying array contents, you will find that they are not displayed in the order you expected. Arrays and variables are created automatically when they are used, and awk will also automatically determine whether they store numbers or strings. In general, arrays in awk are used to collect information from records, and can be used to calculate totals, count words, track the number of times templates are matched, and so on.
Display the account of/etc/passwd
awk - F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i ' / etc / passwd
zero root
one daemon
two bin
three sys
four sync
five games
......
Here, the for loop is used to traverse the array
There are many contents of awk programming. Here is a list of simple and commonly used uses. For more information, please refer to http://www.gnu.org/software/gawk/manual/gawk.html
Link: http://www.cnblogs.com/ggjucheng/archive/2013/01/13/2858470.html