Linux Commands – awk

Linux Commands – awk

Linux Commands – awk

Hello Everyone

Welcome to CloudAffaire and this is Debjeet.

In the last blog post, we have discussed grep command in Linux which is used to match a PATTERN in a given file.

https://cloudaffaire.com/linux-commands-grep/

In this blog post, we will discuss awk command in Linux. AWK is a language designed for text processing and typically used as a data extraction and reporting tool. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.

Linux Commands – awk:

AWK program is a series of pattern action pairs, written as awk ‘/PATTERN/ { ACTION1; ACTION2; }’ FILE, where condition is typically an expression and action is a series of commands. The input is split into records, where by default records are separated by newline characters so that the input is split into lines. The program tests each record against each of the conditions in turn and executes the action for each expression that is true. Either the condition or the action may be omitted. The condition defaults to matching every record. The default action is to print the record. This is the same pattern-action structure as sed.

AWK built-in variables:

AWK programming language has some in-built variables like $0, $1, $NR, $FS etc. You can use these in-built variables to control AWK program behavior. AWK programming language also supports user-defined variables. Below is the list of all in-built variables available in AWK programming language.

  • Field Position: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record.
  • NR: ‘N’umber of ‘R’ecords: keeps a current count of the number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero.
  • FNR: ‘F’ile ‘N’umber of ‘R’ecords: keeps a current count of the number of input records read so far in the current file. This variable is automatically reset to zero each time a new file is started.
  • NF: ‘N’umber of ‘F’ields: contains the number of fields in the current input record. The last field in the input record can be designated by $NF, the 2nd-to-last field by $(NF-1), the 3rd-to-last field by $(NF-2), etc.
  • FILENAME: Contains the name of the current input-file.
  • FS: ‘F’ield ‘S’eparator: contains the “field separator” character used to divide fields in the input record. The default, “white space”, includes any space and tab characters. FS can be reassigned to another character to change the field separator.
  • RS: ‘R’ecord ‘S’eparator: stores the current “record separator” character. Since, by default, an input line is the input record, the default record separator character is a “newline”.
  • OFS: ‘O’output ‘F’ield ‘S’eparator: stores the “output field separator”, which separates the fields when Awk prints them. The default is a “space” character.
  • ORS: ‘O’utput ‘R’ecord ‘S’eparator: stores the “output record separator”, which separates the output records when Awk prints them. The default is a “newline” character.
  • OFMT: ‘O’utput ‘F’or’M’a’T’: stores the format for numeric output. The default format is “%.6g”.

AWK External script:

Instead of passing PATTERN and ACTIONS in the command line, you can also pass an AWK script with -f options while executing AWK command. This comes handy when you want to perform some complex action on the data.

AWK BEGIN and END block:

AWK command process the input data one line at a time and gives the output depending upon the PATTERN or ACTIONS provided to it. But sometimes you may need to execute something before awk processes the 1st line itself. In this scenario, you can use BEGIN block which will be executed 1st. For example, you can define field separator at the beginning in BEGIN block. You may also need to perform some action at the end of the awk command execution for example printing a summary or total. AWK provides END block to execute some action at the very end.

AWK regular expression:

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions. Regular expression has different meta-character which you can use to formulate the search PATTERN. Below are the most commonly used meta-character available in regular expression:

  • . Period matches any single character except a line break.
  • * Matches 0 or more repetitions of the preceding symbol.
  • + Matches 1 or more repetitions of the preceding symbol.
  • ? Makes the preceding symbol optional.
  • ^ Matches the beginning of the input.
  • $ Matches the end of the input.
  • [ ] Matches any character contained between the square brackets.
  • [^ ] Matches any character that is not contained between the square brackets
  • {n,m} Matches at least “n” but not more than “m” repetitions of the preceding symbol.
  • (xyz) Matches the characters xyz in that exact order.
  • | Matches either the characters before or the characters after the symbol.
  • \ This allows you to match reserved characters [ ] ( ) { } . * + ? ^ $ \ |

AWK operators:

AWK is a complete scripting language and like any other language supports different kind of operators to perform different actions. Below are the list of operators available in AWK.

  • arithmetic operator
    • addition (+): performs addition of left operand with right operand
    • subtraction (-): performs subtraction of right operand with left operand
    • multiplication (*): performs multiplication of left operand and right operand
    • division (/): performs addition of left operand with right operand
    • modulus (%): performs modulus division of left operand with right operand
  • increment and decrement operator
    • pre-increment (++i): performs increment 1st and then returns the incremented value
    • post-increment (i++): returns the value 1st and then performs the increment
    • pre-decrement (–i): performs decrement 1st and then returns the decremented value
    • post-decrement (i–): returns the value 1st and then performs the decrement
  • relational operator
    • equality (==): checks if left operand is equal to right operand
    • non-equality (!=): checks if left operand is not equal to right operand
    • less than (<): checks if left operand is less than right operand
    • less than equal to (<=): checks if left operand is less than or equal to right operand
    • greater than (>): checks if left operand is greater than right operand
    • greater than equal to (>=): checks if left operand is greater than or equal to right operand
    • match (~): performs a match of left operand to right operand
    • non-match (!~): performs a non-match of left operand to right operand
  • assignment operator
    • simple assignment (=): assigns the value of right operand to left operand
    • shorthand addition (+=): performs addition of left operand with right operand and assigns the result to left operand
    • shorthand subtraction (-=): performs subtraction of right operand with left operand and assigns the result to left operand
    • shorthand addition (*=): performs multiplication of left operand with right operand and assigns the result to left operand
    • shorthand addition (/=): performs division of left operand with right operand and assigns the result to left operand
    • shorthand addition (%=): performs modulus of left operand with right operand and assigns the result to left operand
    • shorthand exponential (**= or ^=): performs exponential of left operand to right operand and assigns the result to left operand
  • logical operator
    • logical AND (&&): left expression and right expression both true
    • logical OR (||): left expression or right expression anyone true
    • logical NOT (!): right expression is not true then returns true
  • other operator
    • ternary operator (x?y:z): if x true returns y else returns z
    • exponential operator (** or ^): performs exponential of left operand to right operand

AWK conditional statements:

AWK supports the use of conditional statements like if, if-else, and if else-if else. Using conditional statements, you can perform some AWK action based on the specific conditions.

AWK loop statements:

AWK supports the use of looping using for loop, while loop, do-while loop. You can also use a break, continue, and exit statements to perform some specific actions during looping.

AWK array data type:

AWK supports use of array datatype. You can assign value to an array using an array index which starts with 0. AWK also supports use of a multidimensional array. Array elements can be deleted using delete array[element] command.

Hope you have enjoyed this article. As informed earlier, AWK is a complete scripting language of its own and provides lots of features. All AWK features cannot be covered in a single blog post. Maybe in the future, we will have an AWK blog series. In the next blog post, we will discuss sed command in Linux.

Leave a Reply

Close Menu