-
Notifications
You must be signed in to change notification settings - Fork 0
awk
Great resources:
- Cheat sheet and support for awk variants -> textual form
- Guide/book
- String manipulation
- Regular expressions in awk
- Variable scope
BEGIN
/END
- startup and clean-up functions- awk programs -> putting it into a file
- Book: Effective awk Programming
This page is still work in progress
awk
is a powerful line-by-line text processor.
There exists some flavours:
- AWK - original from AT&T
- NAWK - A newer, improved version from AT&T
- GAWK - GNU AWK (from the Free Software foundation)
This article will cover gawk
.
The documentation from GNU Awk is really good!
pattern { action }
pattern { action }
pattern { action }
...
A pattern usually matches if parts of a line match (this can be processed later if needed) => aka. a record.
/Hello/
# ==
/Hello/ {print}
# ==
/Hello/ {print $0}
Default Behavior:
- Pattern: If it matches, the entire line is printed
- No pattern provided: Every line is printed
Default separator = whitespaces => aka. a field.
echo "hello world" | awk '{print $2}'
# prints: `world` ($0 = whole line, $1 = first column considering separation, ...)
Changing the separator:
echo "one|two|three" | awk -F| {print $2}'
# or
echo "one|two|three" | awk 'BEGIN {FS="|"} {print $2}'
Note: The separator used RegExes (regular expressions); if you want to separate for reserved regex characters you must escape them (via \
; e.g. for .
-> \.
).
FS
variable works in scripts as well:
# test_sep1.awk
# BEGIN block (actions before processing)
BEGIN {
FS = "|"
}
# Main block
{
print $2
}
# Shown for illustration; can be omitted since empty:
END {
}
Execute (aka run the awk
file): echo "one|two|three" | awk -f test_sep1.awk
or awk -f test_sep1.awk input.txt
-
FS
: Field separator (default: whitespace) -
OFS
: Output field separator (default: space) -
RS
: Record separator (default: newline) -
ORS
: Output record separator (default is a newline) -
NR
: Number of records processed so far -
NF
: Number of fields in the current record -
$0
: The entire current record -
$1
,$2
, …: The individual fields of the current record
TODO: add explanations for output parts
match(string, regexp [, array])
-> array
is an array of matched groups
-
array[0]
is the whole match, - array[1]`, the first group,
- ...
Example:
echo "one tw_#_o three" | awk '{match($2, /\w+(_#_)(\w+)/, ary)} { print ary[1] }'
# pattern -------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^------- main block (action)
# prints: `_#_`
# what would also work:
echo "one tw_#_o three" | awk '{match($2, /\w+(_#_)(\w+)/, ary)} { some_var = ary[1]; print some_var }'
# or:
echo "one tw_#_o three" | awk '{match($2, /\w+(_#_)(\w+)/, ary)} { $2 = ary[1]; print $2 }'
gsub(regex, replacement, [target])
-> target
: Input for replacement (default: `$0)
echo "one tw_#_o three" | awk '{ gsub(/_#_/, "", $2); print $2 }' # Here no pattern defined (takes every line)
# prints `two`
By default it always overwrites the target; if you do not want that you must assign it to a variable first:
echo "one tw_#_o three" | awk '{ new_var = $2; gsub(/_#_/, "", new_var); print new_var }'
awk '/^\/\// {next} // { print }' ./someFile.txt
Prints column 1 and 4 nicely separated by spaces (the syntax is similar to C's printf
):
awk '{ printf("%-40s%s\n", $1, $4) }'
$ grep -R "CI_TYPE =" ../modules/* | grep -v dummy | awk 'match($1, /modules\/(.*)\/main\.tf/, ary) { $1 = ary[1]; gsub(/"/,"", $4); printf("%-40s %s %.5f\n", $1, $4, 5); }'
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License *.
Code (snippets) are licensed under a MIT License *.
* Unless stated otherwise