Linux Commands – awk
Hello Everyone
Welcome to CloudAffaire and this is Debjeet.
In the last blog post, we have discussed grep command in Linux which is used to match a PATTERN in a given file.
https://cloudaffaire.com/linux-commands-grep/
In this blog post, we will discuss awk command in Linux. AWK is a language designed for text processing and typically used as a data extraction and reporting tool. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.
Linux Commands – awk:
AWK program is a series of pattern action pairs, written as awk ‘/PATTERN/ { ACTION1; ACTION2; }’ FILE, where condition is typically an expression and action is a series of commands. The input is split into records, where by default records are separated by newline characters so that the input is split into lines. The program tests each record against each of the conditions in turn and executes the action for each expression that is true. Either the condition or the action may be omitted. The condition defaults to matching every record. The default action is to print the record. This is the same pattern-action structure as sed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
########################## ## Linux Commands | awk ## ########################## ## Prerequisites: One Unix/Linux/POSIX-compliant operating system with bash shell ##---- ## awk ##---- ## awk '/PATTERN/ { ACTION1; ACTION2; }' FILE ## create a file with dummy data cd mkdir mydir && cd mydir echo "debjeet 1001 5.9 10000 IT IND" > data1 echo "sam 1002 6.1 15000 FIN AUS" >> data1 echo "paul 1003 5.5 12000 IT US" >> data1 echo "martha 1004 5.7 10000 HR US" >> data1 echo "samual 1005 5.9 20000 CTO FR" >> data1 echo "1001,3,March,1986,A" > data2 echo "1002,17,Novenmber,1991,AB" >> data2 echo "1003,21,August,1975,O" >> data2 ## feild position awk '{print}' data ## returns all field in data awk '{print $0}' data ## returns all field in data awk '{print $1}' data ## returns 1st field in data awk '{print $1 " " $2}' data ## returns 1st and 2nd field separated by space in data awk '{print NF}' data | uniq ## returns the no. of fields in data awk '{print $NF}' data ## returns last field in data awk '{print $(NF-1) "\t" $NF}' data ## returns last two fields separated by tab in data awk '/IT/ {print}' data ## returns all the lines having "IT" |
AWK built-in variables:
AWK programming language has some in-built variables like $0, $1, $NR, $FS etc. You can use these in-built variables to control AWK program behavior. AWK programming language also supports user-defined variables. Below is the list of all in-built variables available in AWK programming language.
- Field Position: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record.
- NR: ‘N’umber of ‘R’ecords: keeps a current count of the number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero.
- FNR: ‘F’ile ‘N’umber of ‘R’ecords: keeps a current count of the number of input records read so far in the current file. This variable is automatically reset to zero each time a new file is started.
- NF: ‘N’umber of ‘F’ields: contains the number of fields in the current input record. The last field in the input record can be designated by $NF, the 2nd-to-last field by $(NF-1), the 3rd-to-last field by $(NF-2), etc.
- FILENAME: Contains the name of the current input-file.
- FS: ‘F’ield ‘S’eparator: contains the “field separator” character used to divide fields in the input record. The default, “white space”, includes any space and tab characters. FS can be reassigned to another character to change the field separator.
- RS: ‘R’ecord ‘S’eparator: stores the current “record separator” character. Since, by default, an input line is the input record, the default record separator character is a “newline”.
- OFS: ‘O’output ‘F’ield ‘S’eparator: stores the “output field separator”, which separates the fields when Awk prints them. The default is a “space” character.
- ORS: ‘O’utput ‘R’ecord ‘S’eparator: stores the “output record separator”, which separates the output records when Awk prints them. The default is a “newline” character.
- OFMT: ‘O’utput ‘F’or’M’a’T’: stores the format for numeric output. The default format is “%.6g”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
## Built-In Variables awk '{print $0}' data1 ## returns all field in data1 awk '{print $1}' data1 ## returns 1st field in data1 awk '{print NF}' data1 | uniq ## returns the no. of fields in data1 awk '{print $NF}' data1 ## returns last field in data1 awk '{print NR "==> $0"}' data1 ## returns row number in data1 awk '{print NR "==>" $0}' data1 data2 ## returns row number in data1 and data2, row number does not resets awk '{print FNR "==>" $0}' data1 data2 ## returns row number in data1 and data2, row number resets for each file awk '{print FILENAME "==>" $0}' data1 data2 ## returns file name with data awk '{print FS"==>"$1}' data2 ## returns all fields as default field separator is "white space" awk -F"," '{print FS"==>"$1}' data2 ## returns only 1st field as we have change the default FS to "," using -F option awk 'BEGIN{FS=",";OFS="|"}{print OFS"==>"$1,$2,$3,$4,$5}' data2 ## replace field separator "," with "|" in the output awk 'BEGIN{FS=",";ORS="\n\n"}{print $1,$2,$3,$4,$5}' data2 ## replace default record separator "/n" with "/n/n" awk '{print $2/$3}' data1 ## returns float awk 'BEGIN{OFMT="%.f"}{print $2/$3}' data1 ## returns integer |
AWK External script:
Instead of passing PATTERN and ACTIONS in the command line, you can also pass an AWK script with -f options while executing AWK command. This comes handy when you want to perform some complex action on the data.
1 2 3 4 5 6 7 8 9 10 11 12 |
## AWK External script cat myscript.awk #------------------------------ BEGIN { FS=","; OFS="|" } {print OFS"==>"$1,$2,$3,$4,$5} #------------------------------ awk -f myscript.awk data2 |
AWK BEGIN and END block:
AWK command process the input data one line at a time and gives the output depending upon the PATTERN or ACTIONS provided to it. But sometimes you may need to execute something before awk processes the 1st line itself. In this scenario, you can use BEGIN block which will be executed 1st. For example, you can define field separator at the beginning in BEGIN block. You may also need to perform some action at the end of the awk command execution for example printing a summary or total. AWK provides END block to execute some action at the very end.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## BEGIN and END blocks #BEGIN { ACTIONS } #/PATTERN/ { ACTIONS } #END { ACTIONS } awk 'BEGIN{print"something at start"}{print $0}END{print"something at end"}' data1 cat myscript.awk #------------------------------ BEGIN { print"Total Salary: "; } { salary+=$4; } END { printf "%s\n", salary ; } #------------------------------ awk -f myscript.awk data1 ## returns total of 4th field in data1 |
AWK regular expression:
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions. Regular expression has different meta-character which you can use to formulate the search PATTERN. Below are the most commonly used meta-character available in regular expression:
- . Period matches any single character except a line break.
- * Matches 0 or more repetitions of the preceding symbol.
- + Matches 1 or more repetitions of the preceding symbol.
- ? Makes the preceding symbol optional.
- ^ Matches the beginning of the input.
- $ Matches the end of the input.
- [ ] Matches any character contained between the square brackets.
- [^ ] Matches any character that is not contained between the square brackets
- {n,m} Matches at least “n” but not more than “m” repetitions of the preceding symbol.
- (xyz) Matches the characters xyz in that exact order.
- | Matches either the characters before or the characters after the symbol.
- \ This allows you to match reserved characters [ ] ( ) { } . * + ? ^ $ \ |
1 2 3 4 |
## regular expressions awk '/IT/{print $0}' data1 ## returns lines having "IT" awk '/^sam/{print $0}' data1 ## returns lines begining with "sam" |
AWK operators:
AWK is a complete scripting language and like any other language supports different kind of operators to perform different actions. Below are the list of operators available in AWK.
- arithmetic operator
- addition (+): performs addition of left operand with right operand
- subtraction (-): performs subtraction of right operand with left operand
- multiplication (*): performs multiplication of left operand and right operand
- division (/): performs addition of left operand with right operand
- modulus (%): performs modulus division of left operand with right operand
- increment and decrement operator
- pre-increment (++i): performs increment 1st and then returns the incremented value
- post-increment (i++): returns the value 1st and then performs the increment
- pre-decrement (–i): performs decrement 1st and then returns the decremented value
- post-decrement (i–): returns the value 1st and then performs the decrement
- relational operator
- equality (==): checks if left operand is equal to right operand
- non-equality (!=): checks if left operand is not equal to right operand
- less than (<): checks if left operand is less than right operand
- less than equal to (<=): checks if left operand is less than or equal to right operand
- greater than (>): checks if left operand is greater than right operand
- greater than equal to (>=): checks if left operand is greater than or equal to right operand
- match (~): performs a match of left operand to right operand
- non-match (!~): performs a non-match of left operand to right operand
- assignment operator
- simple assignment (=): assigns the value of right operand to left operand
- shorthand addition (+=): performs addition of left operand with right operand and assigns the result to left operand
- shorthand subtraction (-=): performs subtraction of right operand with left operand and assigns the result to left operand
- shorthand addition (*=): performs multiplication of left operand with right operand and assigns the result to left operand
- shorthand addition (/=): performs division of left operand with right operand and assigns the result to left operand
- shorthand addition (%=): performs modulus of left operand with right operand and assigns the result to left operand
- shorthand exponential (**= or ^=): performs exponential of left operand to right operand and assigns the result to left operand
- logical operator
- logical AND (&&): left expression and right expression both true
- logical OR (||): left expression or right expression anyone true
- logical NOT (!): right expression is not true then returns true
- other operator
- ternary operator (x?y:z): if x true returns y else returns z
- exponential operator (** or ^): performs exponential of left operand to right operand
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
## awk operators # arithmetic operator awk '{print $4+$2}' data1 awk '{print $4-$2}' data1 awk '{print $4*$3}' data1 awk '{print $4/$3}' data1 awk '{print $4%$2}' data1 # increment and decrement operator awk 'BEGIN{i=0;}/IT/{++i;}END{print i}' data1 awk 'BEGIN{i=0;}/IT/{i++;}END{print i}' data1 awk 'BEGIN{i=5;}/IT/{--i;}END{print i}' data1 awk 'BEGIN{i=5;}/IT/{i--;}END{print i}' data1 # relational operator awk '$2 == 1003{print $0}' data1 awk '$2 != 1003{print $0}' data1 awk '$4 < 12000{print $0}' data1 awk '$4 <= 12000{print $0}' data1 awk '$4 > 12000{print $0}' data1 awk '$4 >= 12000{print $0}' data1 awk '$1 ~ "debjeet"{print $0}' data1 awk '$1 !~ "debjeet"{print $0}' data1 # assignment operator awk 'BEGIN{i=5;}{++i;}END{print i}' data1 awk 'BEGIN{i=5;j=10}{i+=j;}END{print i}' data1 awk 'BEGIN{i=5;j=10}{i-=j;}END{print i}' data1 awk 'BEGIN{i=5;j=10}{i*=j;}END{print i}' data1 awk 'BEGIN{i=5;j=10}{i/=j;}END{print i}' data1 awk 'BEGIN{i=100;j=13}{i%=j;}END{print i}' data1 awk 'BEGIN{i=2;j=2}{i**=j;}END{print i}' data1 # logical operator awk '($6 == "US" && $5 == "HR"){print $0}' data1 awk '($6 == "US" || $5 == "HR"){print $0}' data1 awk '!($5 == "HR"){print $0}' data1 # other operator awk '(($4 >= 15000)?(sal="sal > 15000"):(sal="sal < 15000")){print sal,$4}' data1 awk 'BEGIN{i=2;j=2}{k=(i**j);}END{print k}' data1 |
AWK conditional statements:
AWK supports the use of conditional statements like if, if-else, and if else-if else. Using conditional statements, you can perform some AWK action based on the specific conditions.
1 2 3 4 5 6 7 8 9 10 11 |
## conditions # if awk 'BEGIN {i=10;if(i > 5)print i " is greater than 5"}' # if else awk 'BEGIN {i=10;j=5;if(i > j)print i " is greater than " j;else print i " is less than "j}' # if else-if else awk 'BEGIN {i=10;j=5;k=15;if((i > j) && (i > k))print i " is greatest";else if ((j > i) && (j > k))print j " is greatest";else print k " is greatest"}' |
AWK loop statements:
AWK supports the use of looping using for loop, while loop, do-while loop. You can also use a break, continue, and exit statements to perform some specific actions during looping.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
## loops # for loop awk 'BEGIN {for(i = 1; i <= 5; ++i)print i}' # while loop awk 'BEGIN {i = 1; while(i <= 5){print i; ++i}}' # do while loop awk 'BEGIN {i = 1;do{print i;++i}while(i <= 5)}' # break statement awk 'BEGIN {for(i = 1; i <= 10; ++i){if(i>=5)break;else{print i}}}' # continue statement awk 'BEGIN {for(i = 1; i <= 10; ++i){if(i%2) print i; else continue}}' # exit statement awk 'BEGIN {for(i = 1; i <= 10; ++i){if(i>=5) exit; else {print i}}}' |
AWK array data type:
AWK supports use of array datatype. You can assign value to an array using an array index which starts with 0. AWK also supports use of a multidimensional array. Array elements can be deleted using delete array[element] command.
1 2 3 4 5 |
## array awk 'BEGIN {myarray[0] = "one";myarray[1] = "two"; myarray[2] = "three" ;{print myarray[1]}}' awk 'BEGIN {for(i = 0; i < 10; i++){myarray[i] = i; print myarray[i]}}' awk 'BEGIN {for(i = 0; i < 10; i++){myarray[i] = i; if(i%2){delete myarray[i]};print myarray[i]}}' |
Hope you have enjoyed this article. As informed earlier, AWK is a complete scripting language of its own and provides lots of features. All AWK features cannot be covered in a single blog post. Maybe in the future, we will have an AWK blog series. In the next blog post, we will discuss sed command in Linux.