Skip to content

dubiousjim/awkenough

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awkenough
=========

This package includes:

1. a small set of Awk utility routines, in library.awk
2. a stub program runawk.c that makes it easier to write shell scripts with awk #!-lines
3. a few Awk scripts in utils/ that supply some common Unix utilities. If you've got a BSD base system, or you have Gnu coreutils installed, you'll already have these; but if you're using BusyBox, you may not.


Installation
============

1. gmake AWK=/usr/bin/awk
2. sudo gmake PREFIX=/usr/local DESTDIR=/ install

The displayed values for the make variables are the defaults, and can be omitted if you're happy with those values.

AWK can also be set to /bin/busybox; in that case, the runawk binary will invoke busybox with an "awk" argument. This permits building runawk so that it always invokes BusyBox awk, even if another awk is linked from /usr/bin/awk.



Runawk
======

This stub program makes it easy to work around a limitation in the way many Unix kernels (including Linux) handle shebang (#!) lines. They will parse the shebang line only into a maximum of two components, and will then invoke the first component with an ARGV[1] of the second component (when it's present) and an ARGV[2] of the path of the script file containing the shebang line. This works fine if you just want to begin an awk script like this:

    file /path/foo.awk
    ------------------
    #!/usr/bin/awk -f
    ...
    ------------------

If you try to exec /path/foo.awk [...args...], the kernel will read its shebang line and will then invoke:

    "/usr/bin/awk" "-f" "/path/foo.awk" [..args...]

(I included the quotes to make the parsing explicit.)

In such cases, everything works great. However, what if you want to do this:

    file /path/bar.awk
    ------------------
    #!/usr/bin/awk -vRS=; -f
    ...
    ------------------

If you try to exec /path/bar.awk [...args...], the kernel will read its shebang line and will then invoke:

    "/usr/bin/awk" "-vRS=; -f" "/path/bar.awk" [...args...]

Notice the parsing. The remainder of the shebang line is passed to awk as a single argument. Awk won't understand this and your script will fail.

This is a general limitation of how most kernels handle shebang lines and isn't specific to awk. There are workarounds. For example, bar.awk could be rewritten as:

    file /path/bar.awk
    ------------------
    #!/usr/bin/awk -f
    BEGIN { RS=";" }
    ...
    ------------------

Or alternatively, you could give up on trying to exec bar.awk, and instead exec a shell script that bootstraps bar.awk:

    file /path/bar.sh
    ------------------
    #!/bin/sh
    /usr/bin/awk -vRS=; -f /path/bar.awk "$@"
    ------------------

If you wanted to load additional awk scripts containing utility routines, like the library.awk provided with this package, the second workaround is the only way you could do it:

    ------------------
    #!/bin/sh
    /usr/bin/awk -f/usr/share/awkenough/library.awk -f /path/bar.awk "$@"
    ------------------

This is where the runawk stub program makes things easier. It is prepared for the kernel's handling of shebang lines and will automatically expand arguments it needs to before invoking awk. So you can just write your scripts like this:

    file /path/qux.awk
    ------------------
    #!/usr/bin/runawk -f /usr/share/awkenough/library.awk
    ...
    ------------------

(Note that here we leave off the terminal "-f".) When you try to exec qux.awk [...args...], the kernel will invoke:

    "/usr/bin/runawk" "-f /usr/share/awkenough/library.awk" "/path/qux.awk" [...args...]

and runawk will in turn invoke:

    "/usr/bin/awk" "-f" "/usr/share/awkenough/library.awk" "-f" "/path/qux.awk" [...args...]

so everything will work as you probably intended.


Runawk handles the same command-line options as most awks:

* -F arg
* -v VAR=value
* -f file.awk

It also accepts the long-form "--file file.awk" as an alternate for the third option, like gawk does.

Runawk also accepts gawk's

* -e '...awk script ...'

option (and its long-form --source '...awk script...'). The point of this option is that standard awk can be invoked like this:

    awk -f library.awk -f file2 argv1 ...

or like this:

    awk '...awk script...' argv1 ...

but can't be invoked like this:

    awk -f library.awk '...awk script...' argv1 ...

That is, the -f option can't be mixed with providing an awk script directly on the command line, rather than in a separate file. Gawk and runawk's -e option provide a way to do this:

    runawk -f library.awk -e '...awk script...' argv1 ...


Runawk also knows how to handle gawk's --exec option. See the source for details.

Runawk also accepts a further option not present in gawk. This is:

* --stdin

and its effect is to guarantee that the awk program being invoked is handed stdin for processing, after processing any other ARGVs. In many cases awk will already do that automatically, but not all; so if you know in advance you want your script to always receive stdin, this option may be useful. For example, the following script:

    file /path/greet.awk
    ------------------
    #!/usr/bin/runawk -f /usr/share/awkenough/library.awk
    BEGIN {
        print "Hello, " ARGV[1]
        ARGV[1] = ""
        print "Here is stdin:"
    }

    FILENAME == "-" { print }
    ------------------

works great if invoked as "/path/greet.awk MyName". But if it's invoked as "/path/greet.awk MyName /path/another/file", it won't behave as expected. True, it won't print the lines of "/path/another/file"; the FILENAME check prevents that. But neither will it print stdin, because the presence of a file in its ARGV list means that awk will deem it unnecessary to also run the script over its stdin.

You can work around this manually, by inspecting and rewriting all your ARGVs as necessary, or by reading stdin manually using getline instead of waiting for awk to pass it to the main body of your script. But the easiest fix is to just add a --stdin to the runawk shebang line:

    file /path/greet.awk
    ------------------
    #!/usr/bin/runawk -f /usr/share/awkenough/library.awk --stdin
    ...
    ------------------

* Any other short-form options passed to runawk (like "-x") raise an error.

* Any other long-form options are silently passed to the awk binary.

This runawk was adapted from a more full-featured one developed by Aleksey Cheusov <vle@gmx.net>. His is available at http://runawk.sourceforge.net/. I didn't want all the features in his implementation, and I wanted to do some things differently. But you should also have a look at his version and see if it fits your needs better. By default, our runawk binary installs as /usr/bin/runawklite, so as to avoid clashing with his.



Library routines
================

See the file README.lib.



Utilities
=========

Help is usually available by running `<utility> --help`.

column
fmt
join
paste
    These all work like their namesakes in Gnu coreutils or the BSD tool set.

members <group>
    Prints a list of all users who are members of <group>.

uuniq
    Eliminates duplicates like uniq does, without requiring the input to be sorted. Moreover, the output will preserve the order of all unique lines.


Links
=====

<http://wiki.alpinelinux.org/wiki/Awk> may also be useful to the audience of
this package: it summarizes some gritty details about awk semantics, the
behavior of some different awk implementations, and correlates functions
available in awk + awkenough to those available in lua.

About

basic libraries for awk

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published