Awk command details

August 1, 2016 768 point heat 0 liked it 1 comment

Easy to use:

Awk: performs operations on a single line in a file.

Use ':' to split the line and print out the first and fourth fields of the line.

 awk -F :'{print $1,$4}'


Awk command introduction

The most basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be carried out. Complete awk scripts are usually used to format information in text files

1. Call awk:

The first command line mode, such as:
awk [-Field-separator] 'commands' input-file(s)

Here commands is the real awk command, and the [- f field separator] is optional. Awk uses spaces to separate the text by default. Therefore, if you want to browse text with spaces between fields, you do not need to specify this option. However, if you browse a file such as passwd, and the fields in this file use colons as separators, you must use the - f option awk -F : 'commands' input-file

Second, insert all the awk commands into a file and make the awk program executable, and then use the awk command interpreter as the first line of the script to call it by typing the script name

Third, insert all awk commands into a separate file, and then call, such as:

 awk -f awk-script-file input-file

-The f option indicates the awk script in the file awk script file, and input file is the file name browsed by awk

2. Awk script:

The awk script is composed of various operations and modes. According to the separator (- f option), the default value is blank. The contents read are placed in the corresponding fields in turn, and the records are read line by line until the end of the file

2.1. Patterns and actions

Any awk statement is composed of patterns and actions, and there may be many statements in an awk script. The pattern part determines when the action statement triggers and triggers the event. Action is the operation of data. If the mode part is omitted, the action will always remain in the execution state

The pattern can be any conditional statement or compound statement or regular expression. The pattern contains two special fields begin and end. The begin statement is used to set the count and print head. The begin statement is used before any text browsing action, and then the text browsing action is executed according to the input file; The end statement is used to print out the total number of text and the end status flag after awk completes the text browsing action. Any action must be enclosed by {}

If the looping and looping are not used, the looping and looping actions are not specified, and if the looping is not used, the looping is not used to print out the action

2.2. Domain and record:

When awk is executed, its browse marks are $1, $2$ n. This method is called domain tagging. Use $1 and $3 to refer to fields 1 and 3. Note that you use commas to separate fields and $0 to use all fields. For example:

 awk '{print $0}' temp. txt > sav. txt

Indicates that all fields are printed and the results are redirected to sav Txt

 awk '{print $0}' temp. txt|tee sav. txt

Similar to the above example, it will be displayed on the screen

 awk '{print $1,$4}' temp. txt

Print only fields 1 and 4

 awk 'BEGIN {print "NAME  GRADE\n----"} {print $1"\t"$4}' temp. txt

It means to type the information header, that is, the first line of the input content is added with "name grade ---", and the content is separated by tab

 awk 'BEGIN {print "being"} {print $1} END {print "end"}' temp

Print both header and tail

2.3. Condition operator:

<、<=、==、!=、>=、~ Match regular expression~ Does not match regular expression

Match: awk '{if ($4~/ASIMA/) print $0}' temp Indicates that if the fourth field contains asima, the entire page is printed

Exact match: awk '$3=="48" {print $0}' temp Only print records with field 3 equal to "48"

Mismatch: awk '$0 !~ / ASIMA/' temp Print entire record without asima

Not equal to: awk '$1 != "asima"' temp

Less than: awk '{if ($1<$2) print $1 "is smaller"}' temp

Set case: awk '/[Gg]reen/' temp Print the entire record containing green, or green

Any character: awk '$1 ~/^... a/' temp The fourth character in the first field is the record of a, and the symbol '^' represents the beginning of the line, which conforms to ' Represents any character

Or relationship matching: awk '$0~/(abc)|(efg)/' temp When you use |, the statement needs to be enclosed

And relationship: awk '{if ( $1=="a" && $2=="b" ) print $0}' temp

Or relation: awk '{if ($1=="a" || $1=="b") print $0}' temp

2.4. Awk built-in variable:

ARGC Number of command line parameters NF Number of fields of browsing records
AGRV Command line parameter arrangement NR Number of records read
ENVIRON Using system variables in the environment OFS Output field separator
FILENAME File name browsed by awk ORS Output record separator
FNR Number of records browsing files RS record separator
FS Set input field separator, same as - f option NF Number of fields of browsing records

For example:
awk 'END {print NR}' temp Print the number of read records at the end

awk '{print NF,NR,$0} END {print FILENAME}' temp

awk '{if (NR>0 && $4~/Brown/) print $0}' temp At least one record exists and contains brown

Another use of NF: echo $PWD | awk -F/ '{print $NF}' Displays the current directory name

2.5. Awk operator:

Using operators in awk, basic expressions can be divided into number type, string type, variable type, field and array element

Set input field to variable name:

 awk '{name=$1;six=$3; if (six=="man") print name " is " six}' temp

Field value comparison operation:

 awk 'BEGIN {BASE="27"} {if ($4<BASE) print $0}' temp

Modify the value of value field: (the original input file will not be changed)

 awk '{if ($1=="asima") $6=$6-1;print $1,$6,$7}' temp

Modify text field:

 awk '{if ($1=="asima) ($1=="desc");print $1}' temp

Display only the modification records: (only display what you need, distinguish from the previous command, note {})

 awk '{if ($1=="asima) {$1=="desc";print$1}}' temp

Create a new output domain:

 awk '{$4=$3-$2; print $4}' temp

Statistical column value:

 awk '(tot+=$3); End {print tot} 'temp ා displays the contents of each column   awk '{(tot+=$3)}; End {print tot} 'temp ා displays only the last result

Add file length:

 ls -l|awk '/^[^d]/ {print $9"\t"$5} {tot+=$5} END{print "totKB:" tot}'

List only file names:

 $9 '{awls'}

2.6. Awk built-in string function:

Sub (R, s) replaces R with s throughout $0

awk 'gsub(/name/,"xingming") {print $0}' temp

Gsub (R, s, t) replaces R with s in the whole t

Index (s, t) returns the first position of the string t in S

Awk 'begin {print index ("sunny", "NY")}' temp returns 4

Length (s) returns the length of S

Match (s, R) tests whether s contains a string that matches R

Awk '$1 = = j.lulu "{print match ($1," U ")}' temp returns 4

Split (s, a, FS) divides s into sequence a on FS

awk 'BEGIN {print split("12#345#6789",myarray,"#")"'

Return 3, and at the same time, myArray [1] = 12, myArray [2] = (345), myArray [3] = 6789 "

Sprint (FMT, exp) returns the FMT formatted exp

Sub (R, s) replaces R with s from the leftmost and longest substring in $0 (replacing only the first encountered matching string)

Substr (s, P) returns the suffix from P in the string s

Substr (s, P, n) returns the suffix part of the string s with length n starting from P

2.7. Use of printf function:

Character conversion:

 Echo "65" | awk '{printf% C / N ", $0}' ා output a
 "{print" {n "}, output" {999 "}, output" {GIF '} "

Format output:

 awk '{printf "%-15s %s\n",$1,$3}' temp

Align the first field to the left

2.8. Other awk uses:

Pass value to a line of awk command:

 awk '{if ($5<AGE) print $0}' AGE=10 temp
 Who | awk '{if ($1 = = user) print $1 "are in" $2' user = $logname ා using environment variables

Awk script command:

Start with !/ bin/awk -f Without this sentence, self contained scripts cannot be executed

 !/ bin/awk -f # all comment lines must start with a hash '#' # name: student_ tot. awk # to call: student_ tot. awk grade. txt # prints total and average of club student points # print a header first BEGIN { print "Student    Date   Member No.  Grade  Age  Points  Max" print "Name  Joined Gained  Point Available" print"=========================================================" } # let's add the scores of points gained (tot+=$6); # finished processing now let's print the total and average point END { print "Club student total points :" tot print "Average Club Student points :" tot/N }

2.9. Awk array:

The cyclic basic structure of awk

 For (element in array) print array[element] awk 'BEGIN {record="123#456#789";split(record,myarray,"#")}  END { for (i in myarray) {print myarray[i]} }

User defined statements in 3.0 awk
1 Conditional statement (if)

If (expression) ා (variable in array)
Statement 1
Statement 2
"Statement 1" in the format can be multiple statements. If you want to facilitate UNIX awk judgment and facilitate your own reading, you'd better enclose multiple statements with {}. The nested structure is UNIX awk

If (expression)

{statement 1}

Else if (expression)
{statement 2}
{statement 3}

 [ chengmo@localhost  nginx]# awk 'BEGIN{  test=100; if(test>90) { print "very good"; } else if(test>60) { print "good"; } else { print "no pass"; } }'   very good

You can use ";" after each command statement At the end of.

2 Loop statement (while, for, do)

1. While statement


While (expression)

The {statement

 [ chengmo@localhost  nginx]# awk 'BEGIN{  test=100; total=0; while(i<=test) { total+=i; i++; } print total; }' five thousand and fifty

2. For loop

The for loop has two formats:

Format 1:

For (variable in array)



 [ chengmo@localhost  nginx]# awk 'BEGIN{  for(k in ENVIRON) { print k"="ENVIRON[k]; } }'   AWKPATH=.:/ usr/share/awk OLDPWD=/home/web97 SSH_ ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass SELINUX_ LEVEL_ REQUESTED= SELINUX_ ROLE_ REQUESTED= LANG=zh_ CN. GB2312


Description: environ is awk constant, is a sub typical array.

Format 2:

For (variable; condition; expression)



 [ chengmo@localhost  nginx]# awk 'BEGIN{  total=0; for(i=0;i<=100;i++) { total+=i; } print total; }'

five thousand and fifty

3. Do cycle



{statement} while (condition)


 [ chengmo@localhost  nginx]# awk 'BEGIN{  total=0; i=0; do { total+=i; i++; }while(i<=100) print total; }' five thousand and fifty

The above is the awk process control statement. You can see from the syntax above that it is the same as C language. With these statements, in fact, many shell programs can be handed over to awk, and the performance is very fast.

Break causes a program loop to exit when the break statement is used in a while or for statement.
Continue moves the program loop to the next iteration when the continue statement is used with a while or for statement.
Next can cause the next input line to be read in and return to the top of the script. This avoids performing other operations on the current input line.
The exit statement exits the main input loop and transfers control to end, if end exists. If the end rule is not defined or the exit statement is applied in end, the execution of the script is terminated.

NR and FNR
A. The execution order of awk for multiple input files is: first, the code is applied to the first file (a line is read in), then the repeated code is applied to the second file, and then to the third file.
B. Awk has the problem of line number in the execution sequence of multiple input files. When the first file is executed and the second file is read in the next time, what is the first line of the second file? If it is counted as one again, it will be two ones? (because the first file also has the first line). This is the problem of NR and FNR.
NR: global number of lines (the first line of the second file is counted sequentially by the last line of the first file)
FNR: the number of lines of the current file itself (regardless of the number and total number of the first few input files)
For example: data1 There are 40 lines in txt, data2 If there are 50 lines in txt, awk '{}' data1 txt data2. txt
The values of NR are: 1, 2... 40, 41, 42... 90
The values of FNR are: 1, 2... 40, 1, 2... 50
Getline Function Description:
Awk's getline statement is used to simply read a record. Getline is especially useful if the user has a data record similar to two physical records. It completes the separation of general fields (setting the field variable $0 FNR NF NR). It returns 1 if it succeeds and 0 if it fails (at the end of the file).
A. On the whole, getline should be understood as follows:
When the first variable is read to the current file, or getline is not used; It should be noted that since awk has read a line before it processes getline, the return result from getline is interlaced. When there is a redirection character | or < on its left and right, getline acts on the directed input file. Since the file is just opened and has not been read into a line by awk, but just read by getline, then getline returns the first line of the file, not the interlace. B. Getline usage can be roughly divided into three categories (each category is divided into two sub categories), that is, there are six usages in total. The code is as follows: QUOTE:

 nawk ‘BEGIN{“cat data.txt”|getline d; print d}’ data2. txt  nawk ‘BEGIN{“cat data.txt”|getline; print $0}’ data2. txt nawk ‘BEGIN{getline d < “data.txt”; print d}’ data2. txt  nawk ‘BEGIN{getline < “data.txt”; print $0}’ data2. txt

The above four lines of code realize "only print the first line of data.txt file" (if printing all lines, use loop)
eg. nawk ‘BEGIN{FS=”:”;while(getline<”/etc/passwd”>0){print $1}}’ data. txt


 nawk ‘{getline d; print d”#”$3}’ data. txt

Awk first reads in the first line, then processes the getline function, and then assigns the next line to the variable D, and then prints d first. Because D is followed by a newline character, the next following line will cover D, and the following $3 will also cover D.

 nawk ‘{getline; print $0”#”$3}’ data. txt

Awk first reads in the first line, then processes the getline function, and then assigns the next line to $0. Now $0 is the next line, and the following "ා" and $3 (from $0) will override the content of $0.
In awk, sometimes it is necessary to call system tools to complete the work awk is not good at. The system commands provided by awk can be used to execute, but the output of external tools cannot be received. Fortunately, getline can be used to meet this requirement. for example

 test. awk: { datecommand="/bin/date -j -f \"%d/%b/%Y:%H:%M:%S\" " $olddatestr " \"+%Y%m%d %H%M%S\""; datecommand | getline newdatestr  close(datecommand); }

The external command requires awk to occupy a file descriptor, and the maximum number of files that awk can open has a limit, which is not large (for example, 16). Therefore, it is a good habit to make a final close. Defining the command string as a variable is also for the convenience of closing


If life is just like the first sight, what is the sad autumn wind painting fan

Article review

  • ChuHai5

    I learned a lot~~~~~~~

    October 22, 2021