Details of awk command

August 1, 2016 1407 point heat 0 likes 1 comment

Simple use:

Awk: perform operations on the uniqueness of lines in the file.

Use ':' to split this line and print the first and fourth fields of this line.

 awk -F :'{print $1,$4}'

Detailed introduction:

Introduction to AWK commands

The basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be performed. Complete awk scripts are usually used to format information in text files

1. Call awk:

The first command line mode, such as:
awk [-Field-separator] 'commands' input-file(s)

Here, commands are the real awk command, and [- F field separator] is optional. By default, awk is separated by spaces. Therefore, if you want to browse text with spaces between domains, you do not need to specify this option, but if you want to browse a passwd file, and each domain of the file uses a colon as the separator, you must use the - F option: awk -F : 'commands' input-file

Second, insert all awk commands into a file and make the awk program executable, then use the awk command interpreter as the first line of the script to call it by typing the script name

Third, insert all awk commands into a separate file, and then call, such as:

 awk -f awk-script-file input-file

-The f option indicates the awk script in the file awk script file, and the input file is the file name to browse with awk

2. awk script:

The awk script consists of various operations and modes. According to the delimiter (- F option), the default is space. The read content is placed in the corresponding field in turn, and the read is recorded line by line until the end of the file

2.1. Modes and actions

Any awk statement is composed of patterns and actions. There may be many statements in an awk script. The mode part determines when the action statement is triggered and the event is triggered. Action refers to the operation of data. If the mode part is omitted, the action will always remain in the execution state

The mode can be any conditional statement, compound statement or regular expression. The mode contains two special fields BEGIN and END. Use BEGIN statement to set the count and print head. BEGIN statement is used before any text browsing action, and then the text browsing action starts to execute according to the input file; The END statement is used to print out the total number of text and the end status flag after awk completes the text browsing action. Any action must be enclosed with {}

The actual action is indicated in the curly braces {}, which is often used for printing, but there are longer codes such as if, loop looping statements, and loop exit. If no action is specified, awk will print all the browsed records by default

2.2. Domain and record:

When awk is executed, its browse marks are $1, $2... $n, which is called domain mark. Use $1, $3 to refer to the first and third fields. Note that comma is used to separate fields, and $0 means all fields are used. For example:

 awk '{print $0}' temp.txt > sav.txt

It means to print all domains and redirect the results to sav.txt

 awk '{print $0}' temp.txt|tee sav.txt

Similar to the above example, the difference is that it will be displayed on the screen

 awk '{print $1,$4}' temp.txt

Print only the 1st and 4th fields

 awk 'BEGIN {print "NAME  GRADE\n----"} {print $1"\t"$4}' temp.txt

Indicates that the information header is marked, that is, "NAME GRADE n ---------------" is added before the first line of the input content, and the contents are separated by tabs

 awk 'BEGIN {print "being"} {print $1} END {print "end"}' temp

Print header and footer at the same time

2.3. Condition operator:

<、<=、==、!=、>=、~ Match regular expression~ Does not match regular expression

Match: awk '{if ($4~/ASIMA/) print $0}' temp Indicates that if the fourth field contains ASIMA, the whole line will be printed

Exact match: awk '$3=="48" {print $0}' temp Only print records with the third field equal to "48"

Mismatch: awk '$0 !~ / ASIMA/' temp Print the whole record without ASIMA

Not equal to: awk '$1 != "asima"' temp

Less than: awk '{if ($1<$2) print $1 "is smaller"}' temp

Set case: awk '/[Gg]reen/' temp Print the whole record containing Green or green

Any character: awk '$1 ~/^... a/' temp Print the record where the fourth character in the first field is a, and the symbol '^' represents the beginning of the line, consistent with '.' Represents any character

Or relationship matching: awk '$0~/(abc)|(efg)/' temp When | is used, the statement needs to be enclosed

AND and relationship: awk '{if ( $1=="a" && $2=="b" ) print $0}' temp

OR or relationship: awk '{if ($1=="a" || $1=="b") print $0}' temp

2.4. Awk built-in variable:

ARGC Number of command line parameters NF Number of fields for browsing records
AGRV Command line parameter arrangement NR Number of records read
ENVIRON Support the use of system environment variables in queues OFS Output field separator
FILENAME File name browsed by awk ORS Output Record Separator
FNR Number of records browsing files RS record separator
FS Set the input field separator, the same as the - F option NF Number of fields for browsing records

Example:
awk 'END {print NR}' temp Print the number of read records at the end

awk '{print NF,NR,$0} END {print FILENAME}' temp

awk '{if (NR>0 && $4~/Brown/) print $0}' temp At least one record exists and contains Brown

Another use of NF: echo $PWD | awk -F/ '{print $NF}' Display the current directory name

2.5. Awk operator:

Using operators in awk, basic expressions can be divided into numeric, string, variable, field and array elements

Set input field to variable name:

 awk '{name=$1;six=$3; if (six=="man") print name " is " six}' temp

Domain value comparison operation:

 awk 'BEGIN {BASE="27"} {if ($4<BASE) print $0}' temp

Modify the value of the value field: (the original input file will not be changed)

 awk '{if ($1=="asima") $6=$6-1;print $1,$6,$7}' temp

Modify text field:

 awk '{if ($1=="asima) ($1=="desc");print $1}' temp

Only display the modification record: (only display the required, different from the previous command, note {})

 awk '{if ($1=="asima) {$1=="desc";print$1}}' temp

To create a new output field:

 awk '{$4=$3-$2; print $4}' temp

Statistical column value:

 awk '(tot+=$3); END {print tot} 'temp # will display the contents of each column   awk '{(tot+=$3)}; END {print tot} 'temp # Display only the last result

Add file length:

 ls -l|awk '/^[^d]/ {print $9"\t"$5} {tot+=$5} END{print "totKB:" tot}'

List only file names:

 Ls - l | awk '{print $9}' # In general, the file name is the 9th field

2.6. Awk built-in string function:

Sub (r, s) replaces r with s throughout $0

awk 'gsub(/name/,"xingming") {print $0}' temp

Gsub (r, s, t) replaces r with s throughout t

Index (s, t) returns the first position of string t in s

Awk 'BEGIN {print index ("Sunny", "ny")}' temp returns 4

Length (s) returns the length of s

Match (s, r) Tests whether s contains a string matching r

Awk '$1=="J. Lulu" {print match ($1, "u")}' temp returns 4

Split (s, a, fs) divides s into sequence a on fs

awk 'BEGIN {print split("12#345#6789",myarray,"#")"'

Return 3, and myarray [1]="12", myarray [2]="345", myarray [3]="6789"

Sprint (fmt, exp) returns the fmt formatted exp

Sub (r, s) Replace r with s from the leftmost and longest substring in $0 (only replace the first matching string)

Substr (s, p) returns the suffix part of string s starting from p

Substr (s, p, n) returns the suffix part of string s with length n starting from p

2.7. Use of printf function:

Character conversion:

 Echo "65" | awk '{printf "% c  n", $0}' # Output A
 Awk 'BEGIN {printf "% f  n", 999}' # Output 999.000000

Format output:

 awk '{printf "%-15s %s\n",$1,$3}' temp

Align the first field to the left

2.8. Other awk usage:

To transfer values to a line of awk commands:

 awk '{if ($5<AGE) print $0}' AGE=10 temp
 Who | awk '{if ($1==user) print $1 "are in" $2' user=$LOGNAME # Use environment variables

Awk script command:

Start with !/ bin/awk -f , the self contained script cannot be executed without this sentence, for example:

 !/ bin/awk -f # all comment lines must start with a hash '#' # name: student_tot.awk # to call: student_tot.awk grade.txt # prints total and average of club student points # print a header first BEGIN { print "Student    Date   Member No.  Grade  Age  Points  Max" print "Name  Joined Gained  Point Available" print"=========================================================" } # let's add the scores of points gained (tot+=$6); # finished processing now let's print the total and average point END { print "Club student total points :" tot print "Average Club Student points :" tot/N }

2.9. Awk array:

Basic cycle structure of awk

 For (element in array) print array[element] awk 'BEGIN {record="123#456#789";split(record,myarray,"#")}  END { for (i in myarray) {print myarray[i]} }

3.0 User defined statements in awk
1、 Conditional judgment statement (if)

If (expression) # if (Variable in Array)
Statement 1
else
Statement 2
The "statement 1" in the format can be multiple statements. If you want to make it easier for Unix awk to judge and read, you'd better enclose multiple statements with {}. Unix awk branch structure allows nesting, and its format is:

If (expression)

{statement 1}

Else if (expression)
{Statement 2}
else
{Statement 3}

 [ chengmo@localhost  nginx]# awk 'BEGIN{  test=100; if(test>90) { print "very good"; } else if(test>60) { print "good"; } else { print "no pass"; } }'   very good

Each command statement can be ended with a ";" sign.

2、 Loop statement (while, for, do)

1. While statement

Format:

While (expression)

{statement}
example:

 [ chengmo@localhost  nginx]# awk 'BEGIN{  test=100; total=0; while(i<=test) { total+=i; i++; } print total; }' five thousand and fifty

2. For loop

The for loop has two formats:

Format 1:

For (variable in array)

{statement}

example:

 [ chengmo@localhost  nginx]# awk 'BEGIN{  for(k in ENVIRON) { print k"="ENVIRON[k]; } }'   AWKPATH=.:/ usr/share/awk OLDPWD=/home/web97 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass SELINUX_LEVEL_REQUESTED= SELINUX_ROLE_REQUESTED= LANG=zh_CN.GB2312

。。。。。。

Note: ENVIRON is an awk constant and a sub typical array.

Format 2:

For (variable; condition; expression)

{statement}

example:

 [ chengmo@localhost  nginx]# awk 'BEGIN{  total=0; for(i=0;i<=100;i++) { total+=i; } print total; }'

five thousand and fifty

3. Do cycle

Format:

do

{statement} while (condition)

example:

 [ chengmo@localhost  nginx]# awk 'BEGIN{  total=0; i=0; do { total+=i; i++; }while(i<=100) print total; }' five thousand and fifty

The above is the awk process control statement. As you can see from the syntax, it is the same as the c language. With these statements, in fact, many shell programs can be handed over to awk, and the performance is very fast.

Break When a break statement is used for a while or for statement, it causes the program loop to exit.
Continue When the continue statement is used for a while or for statement, it moves the program loop to the next iteration.
Next can cause the next input line to be read and returned to the top of the script. This avoids performing other operations on the current input line.
The exit statement exits the main input loop and transfers control to END, if it exists. If the END rule is not defined, or the exit statement is applied in the END, the execution of the script is terminated.

NR and FNR:
QUOTE:
A. The execution order of awk for multiple input files is to apply the code to the first file (read in line by line), then the repeated code to the second file, and then to the third file.
B. The execution order of multiple input files by awk causes the problem of line number. When the first file is executed and the second file is read in next time, how is the first line of the second file counted? If it is counted as 1 again, there are only two ones? (Because the first file also has the first line). This is the problem of NR and FNR.
NR: Number of global lines (the first line of the second file is counted sequentially followed by the last line of the first file)
FNR: The number of lines of the current file itself (regardless of the number and total number of lines of the previous input files)
For example, if there are 40 rows in data1.txt and 50 rows in data2.txt, then awk '{}' data1.txt data2.txt
The NR values are: 1, 2... 40, 41, 42... 90
FNR values are: 1, 2... 40, 1, 2... 50
Getline function description:
Awk's getline statement is used to simply read a record. If the user has a data record similar to two physical records, getline is particularly useful. It completes the separation of general fields (set field variable $0 FNR NF NR). If it is successful, it returns 1; if it is unsuccessful, it returns 0 (to the end of the file).
QUOTE:
A. On the whole, the usage of getline should be understood as follows:
When there is no redirection character | or<on the left and right, getline acts on the current file and reads the first line of the current file to the variable var or $0 followed by it (no variable); It should be noted that since awk has read a line before processing getline, the returned results from getline are interlaced. When there is a redirection character | or<on the left and right, getline acts on the directed input file. Since the file is just opened and has not been read in by awk, but only by getline, getline returns the first line of the file, not the interlaced line. B. Getline usage can be roughly divided into three categories (each category is divided into two sub categories), that is, there are a total of six usages. The codes are as follows: QUOTE:

 nawk ‘BEGIN{“cat data.txt”|getline d; print d}’ data2.txt  nawk ‘BEGIN{“cat data.txt”|getline; print $0}’ data2.txt nawk ‘BEGIN{getline d < “data.txt”; print d}’ data2.txt  nawk ‘BEGIN{getline < “data.txt”; print $0}’ data2.txt

The above four lines of code implement "only print the first line of data.txt file" (if all lines are printed, use a loop)
eg. nawk ‘BEGIN{FS=”:”;while(getline<”/etc/passwd”>0){print $1}}’ data.txt

QUOTE:

 nawk ‘{getline d; print d”#”$3}’ data.txt

Awk first reads in the first line, then processes the getline function, and then assigns the next line to the variable d, and then prints d first. Since d is followed by a newline character, the # immediately following it will overwrite d, and the $3 immediately following it will also overwrite d.
QUOTE:

 nawk ‘{getline; print $0”#”$3}’ data.txt

Awk first reads the first line, then processes the getline function, and then assigns the next line to $0. Now $0 is the next line, and the following # and $3 (taken from $0) will overwrite the content of $0.
In awk, it is sometimes necessary to call system tools to complete the work that awk is not good at. The system command provided by awk can be used to execute, but cannot receive the output results of external tools. Fortunately, getline can be used to meet this requirement. for example

 test.awk: { datecommand="/bin/date -j -f \"%d/%b/%Y:%H:%M:%S\" " $olddatestr " \"+%Y%m%d %H%M%S\""; datecommand | getline newdatestr  close(datecommand); }

External commands require awk to occupy a file descriptor, and the maximum number of files that can be opened by awk has an upper limit and is not large (for example, 16), so it is a good habit to make a close finally. Defining the command string as a variable is also convenient for closing

Gcod

If life is just like the first sight, what is the sad autumn wind painting fan

Article comments

  • ChuHai5

    I learned a lot~~~~~~~

    October 22, 2021