Awk command details

August 1, 2016 479 point heat 0 liked it 0 comments

Easy to use:

Awk: performs operations on a single line in a file.

Use ':' to split the line and print out the first and fourth fields of the line.

 awk -F :'{print $1,$4}'


Awk command introduction

The most basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be carried out. Complete awk scripts are usually used to format information in text files

1. Call awk:

The first command line mode, such as:
awk [-Field-separator] 'commands' input-file(s)

Here commands is the real awk command, and the [- f field separator] is optional. Awk uses spaces to separate the text by default. Therefore, if you want to browse text with spaces between fields, you do not need to specify this option. However, if you browse a file such as passwd, and the fields in this file use colons as separators, you must use the - f option awk -F : 'commands' input-file

Second, insert all the awk commands into a file and make the awk program executable, and then use the awk command interpreter as the first line of the script to call it by typing the script name

Third, insert all awk commands into a separate file, and then call, such as:

 awk -f awk-script-file input-file

-The f option indicates the awk script in the file awk script file, and input file is the file name browsed by awk

2. Awk script:

The awk script is composed of various operations and modes. According to the separator (- f option), the default value is blank. The contents read are placed in the corresponding fields in turn, and the records are read line by line until the end of the file

2.1. Modes and actions

Any awk statement is composed of patterns and actions, and there may be many statements in an awk script. The pattern part determines when the action statement triggers and triggers the event. Action is the operation of data. If the mode part is omitted, the action will always remain in the execution state

The pattern can be any conditional statement or compound statement or regular expression. The pattern contains two special fields begin and end. The begin statement is used to set the count and print head. The begin statement is used before any text browsing action, and then the text browsing action is executed according to the input file; The end statement is used to print out the total number of text and the end status flag after awk completes the text browsing action. Any action must be enclosed by {}

The actual action is indicated in braces {} and is often used to print. However, there are longer codes such as if and loop looping statements and loop exit. If no action is specified, awk will print out all browsed records by default

2.2. Domain and record:

When awk is executed, its browse marks are $1, $2... $n, which is called domain tag. Use $1 and $3 to refer to fields 1 and 3. Note that you use commas to separate fields and $0 to use all fields. For example:

 awk '{print $0}' temp.txt > sav.txt

Indicates that all fields are printed and the results are redirected to sav.txt

 awk '{print $0}' temp.txt|tee sav.txt

Similar to the above example, it will be displayed on the screen

 awk '{print $1,$4}' temp.txt

Print only fields 1 and 4

 awk 'BEGIN {print "NAME  GRADE\n----"} {print $1"\t"$4}' temp.txt

It means to type the information header, that is, the first line of the input content is added with "name grade ---", and the content is separated by tab

 awk 'BEGIN {print "being"} {print $1} END {print "end"}' temp

Print both header and tail

2.3. Condition operator:

<、<=、==、!=、>=、~ Matching regular expression,! ~ not matching regular expression

Match: awk '{if ($4~/ASIMA/) print $0}' temp Indicates that if the fourth field contains asima, the entire page is printed

Exact match: awk '$3=="48" {print $0}' temp Only print records with field 3 equal to "48"

Mismatch: awk '$0 !~ /ASIMA/' temp Print entire record without asima

Not equal to: awk '$1 != "asima"' temp

Less than: awk '{if ($1<$2) print $1 "is smaller"}' temp

Set case: awk '/[Gg]reen/' temp Print the entire record containing green, or green

Any character: awk '$1 ~/^...a/' temp The fourth character in the first field is a record of A. the symbol "^" represents the beginning of the line, and "coincidence" represents any character

Or relationship matching: awk '$0~/(abc)|(efg)/' temp When you use |, the statement needs to be enclosed

And relationship: awk '{if ( $1=="a" && $2=="b" ) print $0}' temp

Or relation: awk '{if ($1=="a" || $1=="b") print $0}' temp

2.4. Awk built-in variables:

ARGC Number of command line parameters NF Number of fields of browsing records
AGRV Command line parameter arrangement NR Number of records read
ENVIRON Support the use of system environment variables in the queue OFS Output field separator
FILENAME File name browsed by awk ORS Output record separator
FNR Number of records browsing files RS record separator
FS Set input field separator, same as - f option NF Number of fields of browsing records

For example:
awk 'END {print NR}' temp Print the number of read records at the end

awk '{print NF,NR,$0} END {print FILENAME}' temp

awk '{if (NR>0 && $4~/Brown/) print $0}' temp At least one record exists and contains brown

Another use of NF: echo $PWD | awk -F/ '{print $NF}' Displays the current directory name

2.5. Awk operator:

Using operators in awk, basic expressions can be divided into number type, string type, variable type, field and array element

Set input field to variable name:

 awk '{name=$1;six=$3; if (six=="man") print name " is " six}' temp

Field value comparison operation:

 awk 'BEGIN {BASE="27"} {if ($4<BASE) print $0}' temp

Modify the value of value field: (the original input file will not be changed)

 awk '{if ($1=="asima") $6=$6-1;print $1,$6,$7}' temp

Modify text field:

 awk '{if ($1=="asima) ($1=="desc");print $1}' temp

Display only the modification records: (only display what you need, distinguish from the previous command, note {})

 awk '{if ($1=="asima) {$1=="desc";print$1}}' temp

Create a new output domain:

 awk '{$4=$3-$2; print $4}' temp

Statistical column value:

 Awk '(TOT + = $3); end {print tot}' temp # displays the contents of each column 

 awk '{(TOT + = $3)}; end {print tot}' temp # displays only the final result

Add file length:

 ls -l|awk '/^[^d]/ {print $9"\t"$5} {tot+=$5} END{print "totKB:" tot}'

List only file names:

 LS - l|awk '{print $9}' ා in general, the file name is field 9

2.6. Awk built-in string function:

Sub (R, s) replaces R with s throughout $0

awk 'gsub(/name/,"xingming") {print $0}' temp

Gsub (R, s, t) replaces R with s in the whole t

Index (s, t) returns the first position of the string t in S

Awk 'begin {print index ("sunny", "NY")}' temp returns 4

Length (s) returns the length of S

Match (s, R) tests whether s contains a string that matches R

Awk '$1 = = j.lulu "{print match ($1," U ")}' temp returns 4

Split (s, a, FS) divides s into sequence a on FS

awk 'BEGIN {print split("12#345#6789",myarray,"#")"'

Return 3, and at the same time, myArray [1] = 12, myArray [2] = (345), myArray [3] = 6789 "

Sprint (FMT, exp) returns the FMT formatted exp

Sub (R, s) replaces R with s from the leftmost and longest substring in $0 (replacing only the first encountered matching string)

Substr (s, P) returns the suffix from P in the string s

Substr (s, P, n) returns the suffix part of the string s with length n starting from P

2.7. Use of printf function:

Character conversion:

 Echo "65" | awk '{printf% C / N ", $0}' ා output a
 Awk 'begin {printf% F, 999}' ා output 999.000000

Format output:

 awk '{printf "%-15s %s\n",$1,$3}' temp

Align the first field to the left

2.8. Other awk usage:

Pass value to a line of awk command:

 awk '{if ($5<AGE) print $0}' AGE=10 temp
 Who | awk '{if ($1 = = user) print $1 "are in" $2' user = $logname ා using environment variables

Awk script command:

Start with !/bin/awk -f Without this sentence, self contained scripts cannot be executed

 !/bin/awk -f
# all comment lines must start with a hash '#'
# name: student_ tot.awk
# to call: student_ tot.awk grade.txt
# prints total and average of club student points
# print a header first
print "Student    Date   Member No.  Grade  Age  Points  Max"
print "Name  Joined Gained  Point Available"
# let's add the scores of points gained
# finished processing now let's print the total and average point
    print "Club student total points :" tot
    print "Average Club Student points :" tot/N

2.9. Awk array:

The cyclic basic structure of awk

 For (element in array) print array[element]
awk 'BEGIN {record="123#456#789";split(record,myarray,"#")} 
END { for (i in myarray) {print myarray[i]} }

User defined statements in 3.0 awk
1、 Conditional statement (if)

If (expression) ා (variable in array)
Statement 1
Statement 2
"Statement 1" in the format can be multiple statements. If you want to facilitate UNIX awk judgment and facilitate your own reading, you'd better enclose multiple statements with {}. UNIX awk branch structure allows nesting, and its format is as follows:

If (expression)

{statement 1}

Else if (expression)
{statement 2}
{statement 3}

 [chengmo@localhost nginx]# awk 'BEGIN{ 
    print "very good";
else if(test>60)
    print "good";
    print "no pass";
very good

Each command statement can be followed by a ";" sign.

2、 Loop statement (while, for, do)

1. While statement


While (expression)


 [chengmo@localhost nginx]# awk 'BEGIN{ 
print total;

2. For loop

The for loop has two formats:

Format 1:

For (variable in array)



 [chengmo@localhost nginx]# awk 'BEGIN{ 
for(k in ENVIRON)
    print k"="ENVIRON[k];
SSH_ ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
LANG=zh_ CN.GB2312


Description: environ is awk constant, is a sub typical array.

Format 2:

For (variable; condition; expression)



 [chengmo@localhost nginx]# awk 'BEGIN{ 
print total;

five thousand and fifty

3. Do cycle



{statement} while (condition)


 [chengmo@localhost nginx]# awk 'BEGIN{ 
print total;

The above is the awk process control statement. You can see from the syntax above that it is the same as C language. With these statements, in fact, many shell programs can be handed over to awk, and the performance is very fast.

Break causes a program loop to exit when the break statement is used in a while or for statement.
Continue moves the program loop to the next iteration when the continue statement is used with a while or for statement.
Next can cause the next input line to be read in and return to the top of the script. This avoids performing other operations on the current input line.
The exit statement exits the main input loop and transfers control to end, if end exists. If the end rule is not defined or the exit statement is applied in end, the execution of the script is terminated.

NR and FNR
A. The execution order of awk for multiple input files is: first, the code is applied to the first file (a line is read in), then the repeated code is applied to the second file, and then to the third file.
B. Awk has the problem of line number in the execution sequence of multiple input files. When the first file is executed and the second file is read in the next time, what is the first line of the second file? If it is counted as one again, it will be two ones? (because the first file also has the first line). This is the problem of NR and FNR.
NR: global number of lines (the first line of the second file is counted sequentially by the last line of the first file)
FNR: the number of lines of the current file itself (regardless of the number and total number of the first few input files)
For example, if there are 40 lines in data1.txt and 50 lines in data2.txt, then awk '{}' data1.txt data2.txt
The values of NR are: 1, 2 40,41,42…… ninety
The values of FNR are: 1, 2 40, 1, 2…… fifty
Getline Function Description:
Awk's getline statement is used to simply read a record. Getline is especially useful if the user has a data record similar to two physical records. It completes the separation of general fields (setting the field variable $0 FNR NF NR). It returns 1 if it succeeds and 0 if it fails (at the end of the file).
A. On the whole, getline should be understood as follows:
When there is no redirection character | or < on its left and right, getline acts on the current file and reads the first line of the current file to the variable var or $0 (no variable) followed by it. It should be noted that since awk has read in a line before processing getline, the return result of getline is interlaced. When there is a redirection character | or < on its left and right, getline acts on the directed input file. Since the file is just opened and has not been read into a line by awk, but just read by getline, then getline returns the first line of the file, not the interlace. B. Getline usage can be roughly divided into three categories (each category is divided into two sub categories), that is, there are six usages in total. The code is as follows: QUOTE:

 nawk ‘BEGIN{“cat data.txt”|getline d; print d}’ data2.txt 
nawk ‘BEGIN{“cat data.txt”|getline; print $0}’ data2.txt
nawk ‘BEGIN{getline d < “data.txt”; print d}’ data2.txt 
nawk ‘BEGIN{getline < “data.txt”; print $0}’ data2.txt

The above four lines of code realize "only print the first line of data.txt file" (if printing all lines, use loop)
eg. nawk ‘BEGIN{FS=”:”;while(getline<”/etc/passwd”>0){print $1}}’ data.txt


 nawk ‘{getline d; print d”#”$3}’ data.txt

Awk first reads in the first line, then processes the getline function, and then assigns the next line to the variable D, and then prints d first. Because D is followed by a newline character, the next following line will cover D, and the following $3 will also cover D.

 nawk ‘{getline; print $0”#”$3}’ data.txt

Awk first reads in the first line, then processes the getline function, and then assigns the next line to $0. Now $0 is the next line, and the following "ා" and $3 (from $0) will override the content of $0.
In awk, sometimes it is necessary to call system tools to complete the work awk is not good at. The system commands provided by awk can be used to execute, but the output of external tools cannot be received. Fortunately, getline can be used to meet this requirement. for example

   datecommand="/bin/date -j -f \"%d/%b/%Y:%H:%M:%S\" " $olddatestr " \"+%Y%m%d %H%M%S\"";
   datecommand | getline newdatestr 

The external command requires awk to occupy a file descriptor, and the maximum number of files that awk can open has a limit, which is not large (for example, 16). Therefore, it is a good habit to make a final close. Defining the command string as a variable is also for the convenience of closing


If life is just like the first sight, what is the sad autumn wind painting fan

Article review