10 Minutes Learn Linux awk Command Cat's Blog

brief introduction

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful when it analyzes data and generates reports. In simple terms, awk reads the file line by line, slices each line with blank as the default separator, and then performs various analysis and processing on the cut part.

There are three different versions of awk: awk, nawk and gawk. Without special instructions, gawk generally refers to gawk, which is the GNU version of AWK.

Awk derives its name from the first letter of the surnames of its founders, Alfred Aho, Peter Weinberger and Brian Kernighan. In fact, AWK does have its own language: AWK programming language, which has been officially defined as "style scanning and processing language" by the three creators. It allows you to create short programs that read input files, sort data, process data, perform calculations on input, generate reports, and countless other functions.

usage method

awk '{pattern + action}' {filenames}

Although the operation may be complex, the syntax is always like this, where pattern represents what AWK looks for in the data, and action is a series of commands that are executed when matching content is found. The curly braces ({}) do not need to always appear in the program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression to be expressed, enclosed by a slash.

The basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be performed. A complete awk script is usually used to format information in a text file.

Generally, awk is a file processing unit. Awk receives one line of the file and then executes the corresponding command to process the text.

Call awk

There are three ways to call awk

one Command line mode

awk [ - F field - separator ] 'commands' input - file ( s )

Among them, commands Is true awk Command, [ - F field separator ] Is optional. input - file ( s ) Is the file to be processed.

stay awk In each line of the file, each item separated by the domain separator is called a domain. Usually, when you do not name - F In the case of field separator, the default field separator is space.

2.shell Script mode

Put all awk The command inserts a file and causes the awk The program can be executed, and then awk As the first line of the script, the command interpreter is called by typing the name of the script.

amount to shell In the first line of the script: #!/ bin/sh

It can be replaced by: #!/ bin/awk

three Put all awk The command inserts a separate file, and then calls:

awk - f awk - script - file input - file ( s )

Among them, - f Option loading awk - script - file In awk script, input - file ( s ) It's the same as the one above.

This chapter focuses on the command line mode.

Getting Started Example

Assume that the output of last - n 5 is as follows

#Last - n 5 Only the first five lines are taken out

root     pts / one    192.168.1.100    Tue Feb ten eleven : twenty-one    still logged in

root     pts / one    192.168.1.100    Tue Feb ten 00 : forty-six - 02 : twenty-eight    ( 01 : forty-one )

root     pts / one    192.168.1.100    Mon Feb    nine eleven : forty-one - eighteen : thirty    ( 06 : forty-eight )

dmtsai   pts / one    192.168.1.100    Mon Feb    nine eleven : forty-one - eleven : forty-one    ( 00 : 00 )

root     tty1                   Fri Sep    five fourteen : 09 - fourteen : ten    ( 00 : 01 )

If only the 5 most recently logged in accounts are displayed

#last -n 5 | awk '{print $1}'

root

root

root

dmtsai

root

The awk workflow is as follows: read in a record separated by a 'n' newline character, divide the record into fields according to the specified field separator, fill in fields, $0 represents all fields, $1 represents the first field, and $n represents the nth field. The default domain separator is "blank key" or "key", so $1 represents the login user, $3 represents the login user IP, and so on.

If only the account of/etc/passwd is displayed

#cat /etc/passwd |awk -F ':' '{print $1}'

root

daemon

bin

sys

This is an example of awk+action. Each line will execute action {print $1}.

-F Specifies that the field separator is': '.

If only the account of/etc/passwd and the shell corresponding to the account are displayed, and the account and shell are separated by the tab key

#cat /etc/passwd |awk  -F ':'  '{print $1"t"$7}'

root     / bin / bash

daemon   / bin / sh

bin      / bin / sh

sys      / bin / sh

If only the account of/etc/passwd and the corresponding shell of the account are displayed, and the account and shell are separated by commas, the column name name and shell are added to all lines, and "blue,/bin/nose" is added to the last line.

cat / etc / passwd | awk - F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'

name , shell

root , / bin / bash

daemon , / bin / sh

bin , / bin / sh

sys , / bin / sh

....

blue , / bin / nosh

The awk workflow is as follows: first execute BEGING, then read the file, read in a record separated by the/n newline character, then divide the record into fields according to the specified field separator, fill in the fields, $0 represents all fields, $1 represents the first field, $n represents the nth field, and then start executing the action corresponding to the mode. Then start to read the second record ········ until all records are read, and finally execute the END operation.

Search all lines with the root keyword in/etc/passwd

#awk -F: '/root/' /etc/passwd

root : x : zero : zero : root :/ root :/ bin / bash

This is an example of the use of pattern. Only the lines that match the pattern (root here) can execute the action (no action is specified, and the content of each line is output by default).

Regular search is supported, such as awk - F: '/^ root/'/etc/passwd that starts with root

Search all lines with the root keyword in/etc/passwd, and display the corresponding shell

# awk -F: '/root/{print $7}' /etc/passwd

/ bin / bash

Action {print $7} is specified here

Awk built-in variable

Awk has many built-in variables to set environment information. These variables can be changed. Here are some of the most commonly used variables.

ARGC                Number of command line parameters

ARGV                Command line parameter arrangement

ENVIRON              Support the use of system environment variables in queues

FILENAME           awk File name browsed

FNR                  Number of records browsing files

FS                  Set the input field separator, equivalent to the command line - F option

NF                  Number of fields for browsing records

NR                  Number of records read

OFS                  Output field separator

ORS                  Output Record Separator

RS                  record separator

In addition, the $0 variable refers to the entire record$ 1 represents the first field of the current line, $2 represents the second field of the current line,... and so on

Statistics of/etc/passwd: file name, line number of each line, number of columns in each line, and corresponding complete line contents:

#awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd

filename :/ etc / passwd , linenumber : one , columns : seven , linecontent : root : x : zero : zero : root :/ root :/ bin / bash

filename :/ etc / passwd , linenumber : two , columns : seven , linecontent : daemon : x : one : one : daemon :/ usr / sbin :/ bin / sh

filename :/ etc / passwd , linenumber : three , columns : seven , linecontent : bin : x : two : two : bin :/ bin :/ bin / sh

filename :/ etc / passwd , linenumber : four , columns : seven , linecontent : sys : x : three : three : sys :/ dev :/ bin / sh

Using printf instead of print can make the code more concise and readable

awk -F ':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%sn",FILENAME,NR,NF,$0)}' /etc/passwd

Print and printf

Both print and printf print output functions are provided in awk.

The parameters of the print function can be variables, numeric values or strings. The string must be quoted in double quotation marks and the parameters separated by commas. If there is no comma, the parameters are concatenated and cannot be distinguished. Here, the comma is the same as the separator of the output file, except that the latter is a space.

The usage of printf function is basically similar to that of printf in c language. It can format strings. When the output is complex, printf is easier to use and the code is easier to understand.

Awk programming

Variables and Assignment

In addition to awk's built-in variables, awk can also customize variables.

The following statistics show the number of accounts in/etc/passwd

awk '{count++;print $0;} END{print "user count is ", count}' / etc / passwd

root : x : zero : zero : root :/ root :/ bin / bash

......

user count is forty

Count is a user-defined variable. In the previous action {}, there is only one print. In fact, print is only one statement, while action {} can have multiple statements; No.

The count is not initialized here. Although the default value is 0, it is appropriate to initialize it to 0:

awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' / etc / passwd

[ start ] user count is zero

root : x : zero : zero : root :/ root :/ bin / bash

...

[ end ] user count is forty

Count the number of bytes occupied by files in a folder

ls - l | awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'

[ end ] size is eight million six hundred and fifty-seven thousand one hundred and ninety-eight

If displayed in M:

ls - l | awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}'

[ end ] size is eight point two five eight eight nine M

Note that the statistics do not include subdirectories of folders.

Conditional statement

The conditional statements in awk are borrowed from the C language. See the following declaration method:

if ( expression ) {

     statement ;

     statement ;

     ... ...

}

if ( expression ) {

     statement ;

} else {

     statement2 ;

}

if ( expression ) {

     statement1 ;

} else if ( expression1 ) {

     statement2 ;

} else {

     statement3 ;

}

Count the number of bytes occupied by files in a folder and filter 4096 files (usually folders):

ls - l | awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}'

[ end ] size is eight point two two three three nine M

Loop statement

The loop statements in awk are also borrowed from C language, and support while, do/while, for, break, and continue. The semantics of these keywords are identical to those in C language.

array

Because the subscripts of arrays in awk can be numbers and letters, the subscripts of arrays are usually called keys. Values and keywords are stored in an internal hash table for key/value applications. Since hashes are not stored in order, when displaying array contents, you will find that they are not displayed in the order you expected. Arrays and variables are created automatically when they are used, and awk will also automatically determine whether they store numbers or strings. In general, arrays in awk are used to collect information from records, and can be used to calculate totals, count words, track the number of times templates are matched, and so on.

Display the account of/etc/passwd

awk - F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i ' / etc / passwd

zero root

one daemon

two bin

three sys

four sync

five games

......

Here, the for loop is used to traverse the array

There are many contents of awk programming. Here is a list of simple and commonly used uses. For more information, please refer to http://www.gnu.org/software/gawk/manual/gawk.html

Link: http://www.cnblogs.com/ggjucheng/archive/2013/01/13/2858470.html

Original link: Learn Linux awk command in 10 minutes , Please indicate the source for reprinting!