Simple use:
Awk: perform operations on the uniqueness of lines in the file.
Use ':' to split this line and print the first and fourth fields of this line.
Detailed introduction:
Introduction to AWK commands
The basic function of awk language is to browse and extract information from files or strings based on specified rules. After awk extracts information, other text operations can be performed. Complete awk scripts are usually used to format information in text files
1. Call awk:
The first command line mode, such as:
awk [-Field-separator] 'commands' input-file(s)
Here, commands are the real awk command, and [- F field separator] is optional. By default, awk is separated by spaces. Therefore, if you want to browse text with spaces between domains, you do not need to specify this option, but if you want to browse a passwd file, and each domain of the file uses a colon as the separator, you must use the - F option: awk -F : 'commands' input-file
Second, insert all awk commands into a file and make the awk program executable, then use the awk command interpreter as the first line of the script to call it by typing the script name
Third, insert all awk commands into a separate file, and then call, such as:
awk -f awk-script-file input-file
|
-The f option indicates the awk script in the file awk script file, and the input file is the file name to browse with awk
2. awk script:
The awk script consists of various operations and modes. According to the delimiter (- F option), the default is space. The read content is placed in the corresponding field in turn, and the read is recorded line by line until the end of the file
2.1. Modes and actions
Any awk statement is composed of patterns and actions. There may be many statements in an awk script. The mode part determines when the action statement is triggered and the event is triggered. Action refers to the operation of data. If the mode part is omitted, the action will always remain in the execution state
The mode can be any conditional statement, compound statement or regular expression. The mode contains two special fields BEGIN and END. Use BEGIN statement to set the count and print head. BEGIN statement is used before any text browsing action, and then the text browsing action starts to execute according to the input file; The END statement is used to print out the total number of text and the end status flag after awk completes the text browsing action. Any action must be enclosed with {}
The actual action is indicated in the curly braces {}, which is often used for printing, but there are longer codes such as if, loop looping statements, and loop exit. If no action is specified, awk will print all the browsed records by default
2.2. Domain and record:
When awk is executed, its browse marks are $1, $2... $n, which is called domain mark. Use $1, $3 to refer to the first and third fields. Note that comma is used to separate fields, and $0 means all fields are used. For example:
awk '{print $0}' temp.txt > sav.txt
|
It means to print all domains and redirect the results to sav.txt
awk '{print $0}' temp.txt|tee sav.txt
|
Similar to the above example, the difference is that it will be displayed on the screen
awk '{print $1,$4}' temp.txt
|
Print only the 1st and 4th fields
awk 'BEGIN {print "NAME GRADE\n----"} {print $1"\t"$4}' temp.txt
|
Indicates that the information header is marked, that is, "NAME GRADE n ---------------" is added before the first line of the input content, and the contents are separated by tabs
awk 'BEGIN {print "being"} {print $1} END {print "end"}' temp
|
Print header and footer at the same time
2.3. Condition operator:
<、<=、==、!=、>=、~
Match regular expression~ Does not match regular expression
Match: awk '{if ($4~/ASIMA/) print $0}' temp
Indicates that if the fourth field contains ASIMA, the whole line will be printed
Exact match: awk '$3=="48" {print $0}' temp
Only print records with the third field equal to "48"
Mismatch: awk '$0 !~ / ASIMA/' temp
Print the whole record without ASIMA
Not equal to: awk '$1 != "asima"' temp
Less than: awk '{if ($1<$2) print $1 "is smaller"}'
temp
Set case: awk '/[Gg]reen/' temp
Print the whole record containing Green or green
Any character: awk '$1 ~/^... a/' temp
Print the record where the fourth character in the first field is a, and the symbol '^' represents the beginning of the line, consistent with '.' Represents any character
Or relationship matching: awk '$0~/(abc)|(efg)/' temp
When | is used, the statement needs to be enclosed
AND and relationship: awk '{if ( $1=="a" && $2=="b" ) print $0}' temp
OR or relationship: awk '{if ($1=="a" || $1=="b") print $0}' temp
2.4. Awk built-in variable:
ARGC |
Number of command line parameters |
NF |
Number of fields for browsing records |
AGRV |
Command line parameter arrangement |
NR |
Number of records read |
ENVIRON |
Support the use of system environment variables in queues |
OFS |
Output field separator |
FILENAME |
File name browsed by awk |
ORS |
Output Record Separator |
FNR |
Number of records browsing files |
RS |
record separator |
FS |
Set the input field separator, the same as the - F option |
NF |
Number of fields for browsing records |
Example:
awk 'END {print NR}' temp
Print the number of read records at the end
awk '{print NF,NR,$0} END {print FILENAME}' temp
awk '{if (NR>0 && $4~/Brown/) print $0}' temp
At least one record exists and contains Brown
Another use of NF: echo $PWD | awk -F/ '{print $NF}'
Display the current directory name
2.5. Awk operator:
Using operators in awk, basic expressions can be divided into numeric, string, variable, field and array elements
Set input field to variable name:
awk '{name=$1;six=$3; if (six=="man") print name " is " six}' temp
|
Domain value comparison operation:
awk 'BEGIN {BASE="27"} {if ($4<BASE) print $0}' temp
|
Modify the value of the value field: (the original input file will not be changed)
awk '{if ($1=="asima") $6=$6-1;print $1,$6,$7}' temp
|
Modify text field:
awk '{if ($1=="asima) ($1=="desc");print $1}' temp
|
Only display the modification record: (only display the required, different from the previous command, note {})
awk '{if ($1=="asima) {$1=="desc";print$1}}' temp
|
To create a new output field:
awk '{$4=$3-$2; print $4}' temp
|
Statistical column value:
awk '(tot+=$3); END {print tot} 'temp # will display the contents of each column awk '{(tot+=$3)}; END {print tot} 'temp # Display only the last result
|
Add file length:
ls -l|awk '/^[^d]/ {print $9"\t"$5} {tot+=$5} END{print "totKB:" tot}'
|
List only file names:
Ls - l | awk '{print $9}' # In general, the file name is the 9th field
|
2.6. Awk built-in string function:
Sub (r, s) replaces r with s throughout $0
awk 'gsub(/name/,"xingming") {print $0}' temp
Gsub (r, s, t) replaces r with s throughout t
Index (s, t) returns the first position of string t in s
Awk 'BEGIN {print index ("Sunny", "ny")}' temp returns 4
Length (s) returns the length of s
Match (s, r) Tests whether s contains a string matching r
Awk '$1=="J. Lulu" {print match ($1, "u")}' temp returns 4
Split (s, a, fs) divides s into sequence a on fs
awk 'BEGIN {print split("12#345#6789",myarray,"#")"'
Return 3, and myarray [1]="12", myarray [2]="345", myarray [3]="6789"
Sprint (fmt, exp) returns the fmt formatted exp
Sub (r, s) Replace r with s from the leftmost and longest substring in $0 (only replace the first matching string)
Substr (s, p) returns the suffix part of string s starting from p
Substr (s, p, n) returns the suffix part of string s with length n starting from p
2.7. Use of printf function:
Character conversion:
Echo "65" | awk '{printf "% c n", $0}' # Output A
|
Awk 'BEGIN {printf "% f n", 999}' # Output 999.000000
|
Format output:
awk '{printf "%-15s %s\n",$1,$3}' temp
|
Align the first field to the left
2.8. Other awk usage:
To transfer values to a line of awk commands:
awk '{if ($5<AGE) print $0}' AGE=10 temp
|
Who | awk '{if ($1==user) print $1 "are in" $2' user=$LOGNAME # Use environment variables
|
Awk script command:
Start with !/ bin/awk -f
, the self contained script cannot be executed without this sentence, for example:
!/ bin/awk -f # all comment lines must start with a hash '#' # name: student_tot.awk # to call: student_tot.awk grade.txt # prints total and average of club student points # print a header first BEGIN { print "Student Date Member No. Grade Age Points Max" print "Name Joined Gained Point Available" print"=========================================================" } # let's add the scores of points gained (tot+=$6); # finished processing now let's print the total and average point END { print "Club student total points :" tot print "Average Club Student points :" tot/N }
|
2.9. Awk array:
Basic cycle structure of awk
For (element in array) print array[element] awk 'BEGIN {record="123#456#789";split(record,myarray,"#")} END { for (i in myarray) {print myarray[i]} }
|
3.0 User defined statements in awk
1、 Conditional judgment statement (if)
If (expression) # if (Variable in Array)
Statement 1
else
Statement 2
The "statement 1" in the format can be multiple statements. If you want to make it easier for Unix awk to judge and read, you'd better enclose multiple statements with {}. Unix awk branch structure allows nesting, and its format is:
If (expression)
{statement 1}
Else if (expression)
{Statement 2}
else
{Statement 3}
[ chengmo@localhost nginx]# awk 'BEGIN{ test=100; if(test>90) { print "very good"; } else if(test>60) { print "good"; } else { print "no pass"; } }' very good
|
Each command statement can be ended with a ";" sign.
2、 Loop statement (while, for, do)
1. While statement
Format:
While (expression)
{statement}
example:
[ chengmo@localhost nginx]# awk 'BEGIN{ test=100; total=0; while(i<=test) { total+=i; i++; } print total; }' five thousand and fifty
|
2. For loop
The for loop has two formats:
Format 1:
For (variable in array)
{statement}
example:
[ chengmo@localhost nginx]# awk 'BEGIN{ for(k in ENVIRON) { print k"="ENVIRON[k]; } }' AWKPATH=.:/ usr/share/awk OLDPWD=/home/web97 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass SELINUX_LEVEL_REQUESTED= SELINUX_ROLE_REQUESTED= LANG=zh_CN.GB2312
|
。。。。。。
Note: ENVIRON is an awk constant and a sub typical array.
Format 2:
For (variable; condition; expression)
{statement}
example:
[ chengmo@localhost nginx]# awk 'BEGIN{ total=0; for(i=0;i<=100;i++) { total+=i; } print total; }'
|
five thousand and fifty
3. Do cycle
Format:
do
{statement} while (condition)
example:
[ chengmo@localhost nginx]# awk 'BEGIN{ total=0; i=0; do { total+=i; i++; }while(i<=100) print total; }' five thousand and fifty
|
The above is the awk process control statement. As you can see from the syntax, it is the same as the c language. With these statements, in fact, many shell programs can be handed over to awk, and the performance is very fast.
Break When a break statement is used for a while or for statement, it causes the program loop to exit.
Continue When the continue statement is used for a while or for statement, it moves the program loop to the next iteration.
Next can cause the next input line to be read and returned to the top of the script. This avoids performing other operations on the current input line.
The exit statement exits the main input loop and transfers control to END, if it exists. If the END rule is not defined, or the exit statement is applied in the END, the execution of the script is terminated.
NR and FNR:
QUOTE:
A. The execution order of awk for multiple input files is to apply the code to the first file (read in line by line), then the repeated code to the second file, and then to the third file.
B. The execution order of multiple input files by awk causes the problem of line number. When the first file is executed and the second file is read in next time, how is the first line of the second file counted? If it is counted as 1 again, there are only two ones? (Because the first file also has the first line). This is the problem of NR and FNR.
NR: Number of global lines (the first line of the second file is counted sequentially followed by the last line of the first file)
FNR: The number of lines of the current file itself (regardless of the number and total number of lines of the previous input files)
For example, if there are 40 rows in data1.txt and 50 rows in data2.txt, then awk '{}' data1.txt data2.txt
The NR values are: 1, 2... 40, 41, 42... 90
FNR values are: 1, 2... 40, 1, 2... 50
Getline function description:
Awk's getline statement is used to simply read a record. If the user has a data record similar to two physical records, getline is particularly useful. It completes the separation of general fields (set field variable $0 FNR NF NR). If it is successful, it returns 1; if it is unsuccessful, it returns 0 (to the end of the file).
QUOTE:
A. On the whole, the usage of getline should be understood as follows:
When there is no redirection character | or<on the left and right, getline acts on the current file and reads the first line of the current file to the variable var or $0 followed by it (no variable); It should be noted that since awk has read a line before processing getline, the returned results from getline are interlaced. When there is a redirection character | or<on the left and right, getline acts on the directed input file. Since the file is just opened and has not been read in by awk, but only by getline, getline returns the first line of the file, not the interlaced line. B. Getline usage can be roughly divided into three categories (each category is divided into two sub categories), that is, there are a total of six usages. The codes are as follows: QUOTE:
nawk ‘BEGIN{“cat data.txt”|getline d; print d}’ data2.txt nawk ‘BEGIN{“cat data.txt”|getline; print $0}’ data2.txt nawk ‘BEGIN{getline d < “data.txt”; print d}’ data2.txt nawk ‘BEGIN{getline < “data.txt”; print $0}’ data2.txt
|
The above four lines of code implement "only print the first line of data.txt file" (if all lines are printed, use a loop)
eg. nawk ‘BEGIN{FS=”:”;while(getline<”/etc/passwd”>0){print $1}}’ data.txt
QUOTE:
nawk ‘{getline d; print d”#”$3}’ data.txt
|
Awk first reads in the first line, then processes the getline function, and then assigns the next line to the variable d, and then prints d first. Since d is followed by a newline character, the # immediately following it will overwrite d, and the $3 immediately following it will also overwrite d.
QUOTE:
nawk ‘{getline; print $0”#”$3}’ data.txt
|
Awk first reads the first line, then processes the getline function, and then assigns the next line to $0. Now $0 is the next line, and the following # and $3 (taken from $0) will overwrite the content of $0.
In awk, it is sometimes necessary to call system tools to complete the work that awk is not good at. The system command provided by awk can be used to execute, but cannot receive the output results of external tools. Fortunately, getline can be used to meet this requirement. for example
test.awk: { datecommand="/bin/date -j -f \"%d/%b/%Y:%H:%M:%S\" " $olddatestr " \"+%Y%m%d %H%M%S\""; datecommand | getline newdatestr close(datecommand); }
|
External commands require awk to occupy a file descriptor, and the maximum number of files that can be opened by awk has an upper limit and is not large (for example, 16), so it is a good habit to make a close finally. Defining the command string as a variable is also convenient for closing