AWK One Liners

Simple AWK programs enclosed in single quotes can be typed and executed right at the Unix prompt. For example, the program

awk ‘BEGIN { FS = “:” } { print $1 | “sort” }’ /etc/passwd

This program prints a sorted list of the login names of all users.

If an input file or output file are not specified, awk will expect input from stdin or output to stdout.

AWK is very flexible about matching patterns. Patterns can be

  • regular expressions enclosed by slashes, e.g.: /regular expression/
  • relational expressions, e.g.: $3!=$4
  • pattern-matching expressions, e.g.: $1 !~ /string/
  • or any combination of these, e.g.: (substr($0,5,2)==”xx” && $3 ~ /nasty/ ) || /^The/ || /$mean/ || $4>2

(This last example selects lines where the two characters starting in fifth column are xx and the third field matches nasty, plus lines beginning with The, plus lines ending with mean, plus lines in which the fourth field is greater than two.)

AWK procedures are enclosed in {curly brackets}. Procedures can

(1) Assign variables or arrays. For example:

BEGIN {FS = “,”}
(resets the field-separator character to comma before reading input)

/string/ { count[“string”]++ }
(creates array count indexing the occurrences of string)

split (substr($0,4,12),N,”,”)
(splits the substring at commas into arrays N[1], N[2], …

newvar = $4*sqrt($5)

AWK operators by order of (decreasing) precedence:

Field reference: $
Increment or decrement: ++ --
Exponentiate: ^
Multiply, divide, modulus: * / %
Add, subtract: + –
Concatenation: (blank space)
Relational: < <= > >= != ==
Match regular expression: ~ !~
Logical: && ||
C-style assignment: = += -= *= /= %= ^=

AWK arithmetic functions: exp, int, log and sqrt

AWK string functions:

index(string,substring)
returns position of first occurrence of substring in string, or 0

length[(argument)]
returns length of argument, if specified, or $0.

split(string,array[,f])
splits string by separator character f (or blanks) into array[1], array[2], …

substr(string,s[,l])
extracts from string starting at position number s with length l (or the rest of string).

(2) print output. Output can be unformatted (print) or formatted (printf):

{ print $2 $1 “:” $4/$3 }

prints the first two fields in reverse order, a colon and the integer ratio of the next fields.)

{ printf “Clump %d: t%4.1f acres t%s. n”, $2, $4*640, $6 }

prints formatted output: %d specifies a decimal number format for the clump ID; %n.mf specifies a floating-point number format for the acreage, converted from square miles; %s specifies a string format. t is the tab character; n is the newline character. The input line

28 4 12 0.072 vegcov forest spearfish

would be printed as the tab-aligned output line

Clump 4: 46.1 acres forest

(3) perform flow-control (you won’t need these for this class):

Do-loops:

for ( [initial expression];
[test expression];
[increment counter expression] )
{ commands }

example: for (i = 1; i <= 20; i++) does 20 iterations

If-Then-Else:

if (condition)
{ commands1 }
[ else
{ commands2 } ]

does commands1 if condition is true; commands2 (or nothing) if false; condition is any expression with relational or pattern-match operators.

Other flow-control commands:

break exits from a for loop.
continue begins next iteration of a for loop.
exit terminates remaining procedures; terminates input; executes END procedure, if any.

The following command runs a simple awk program that searches the input file /etc/passwd for the character string `foo’ (a grouping of characters is usually called a string; the term string is based on similar usage in English, such as “a string of pearls,” or “a string of cars in a train”):

awk ‘/foo/ { print $0 }’ /etc/passwd

When lines containing `foo’ are found, they are printed because `print $0′ means print the current line. (Just `print’ by itself means the same thing, so we could have written that instead.)

You will notice that slashes (`/’) surround the string `foo’ in the awk program. The slashes indicate that `foo’ is the pattern to search for. This type of pattern is called a regular expression, which is covered in more detail later (see Regexp). The pattern is allowed to match parts of words. There are single quotes around the awk program so that the shell won’t interpret any of it as special shell characters.

Here is what this program prints:

$ awk '/foo/ { print $0 }' BBS-list
-| fooey 555-1234 2400/1200/300 B
-| foot 555-6699 1200/300 B
-| macfoo 555-6480 1200/300 A
-| sabafoo 555-2127 1200/300 C

In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.

Thus, we could leave out the action (the print statement and the curly braces) in the previous example and the result would be the same: all lines matching the pattern `foo’ are printed. By comparison, omitting the print statement but retaining the curly braces makes an empty action that does nothing (i.e., no lines are printed).

Many practical awk programs are just a line or two. Following is a collection of useful, short programs to get you started. Some of these programs contain constructs that haven’t been covered yet. (The description of the program will give you a good idea of what is going on, but please read the rest of the Web page to become an awk expert!) Most of the examples use a data file named data. This is just a placeholder; if you use these programs yourself, substitute your own file names for data. For future reference, note that there is often more than one way to do things in awk. At some point, you may want to look back at these examples and see if you can come up with different ways to do the same things shown here:

* Print the length of the longest input line:

awk '{ if (length($0) > max) max = length($0) }
END { print max }' data

* Print every line that is longer than 80 characters:

awk 'length($0) > 80' data

The sole rule has a relational expression as its pattern and it has no action—so the default action, printing the record, is used.

* Print the length of the longest line in data:

expand data | awk '{ if (x < length()) x = length() }
END { print "maximum line length is " x }'

The input is processed by the expand utility to change tabs into spaces, so the widths compared are actually the right-margin columns.
* Print every line that has at least one field:

awk 'NF > 0' data

This is an easy way to delete blank lines from a file (or rather, to create a new file similar to the old file but from which the blank lines have been removed).
* Print seven random numbers from 0 to 100, inclusive:

awk 'BEGIN { for (i = 1; i <= 7; i++)
print int(101 * rand()) }'

* Print the total number of bytes used by files:

ls -l files | awk '{ x += $5 }
END { print "total bytes: " x }'

* Print the total number of kilobytes used by files:

ls -l files | awk '{ x += $5 }
END { print "total K-bytes: " (x + 1023)/1024 }'

* Print a sorted list of the login names of all users:

awk -F: '{ print $1 }' /etc/passwd | sort

* Count the lines in a file:

awk 'END { print NR }' data

* Print the even-numbered lines in the data file:

awk 'NR % 2 == 0' data

If you use the expression `NR % 2 == 1′ instead, the program would print the odd-numbered lines.

awk '{ if (NF > max) max = NF }
END { print max }'

This program prints the maximum number of fields on any input line.

awk 'length($0) > 80'
This program prints every line longer than 80 characters. The sole rule has a relational expression as its pattern, and has no action (so the default action, printing the record, is used).

awk 'NF > 0'
This program prints every line that has at least one field. This is an easy way to delete blank lines from a file (or rather, to create a new file similar to the old file but from which the blank lines have been deleted).

awk '{ if (NF > 0) print }'
This program also prints every line that has at least one field. Here we allow the rule to match every line, then decide in the action whether to print.

awk 'BEGIN { for (i = 1; i <= 7; i++)
print int(101 * rand()) }'

This program prints 7 random numbers from 0 to 100, inclusive.

ls -l files | awk '{ x += $4 } ; END { print "total bytes: " x }'
This program prints the total number of bytes used by files.

expand file | awk '{ if (x < length()) x = length() }
END { print "maximum line length is " x }'

This program prints the maximum line length of file. The input is piped through the expand program to change tabs into spaces, so the widths compared are actually the right-margin columns.

awk 'BEGIN { FS = ":" }
{ print $1 | "sort" }' /etc/passwd

This program prints a sorted list of the login names of all users.

awk '{ nlines++ }
END { print nlines }'

This programs counts lines in a file.

awk 'END { print NR }'
This program also counts lines in a file, but lets awk do the work.

awk '{ print NR, $0 }'
This program adds line numbers to all its input files, similar to `cat -n’.

Be the first to comment

Leave a Reply

Your email address will not be published.


*


CommentLuv badge