Skip to content

A parallel/xargs like tool. Executes commands using sets of input.

License

Notifications You must be signed in to change notification settings

imarsman/concur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

concur

A parallel workalike in golang, though it is not really a parallel workalike. It is more like a text line processing tool with the option of shell execution and the application of an awk script. One benefit is the ability to run commands against input in parallel, similar to the parallel and xargs utilities. Things like concurrency and the number of concurrent slots can be modified as well as omitting or keeping empty output lines. Null end of line terminator is supported and things like whether to allow output from parallel commands to be written as it comes in or saved per command run.

parallel excels at producing lists of text values that can be used to do many amazing things when they are integrated into shell commands. The implementation of concur is more deterministic, with one predictable set of inputs for each line processed. There is no real re-arranging of input lists beyond randomization.

concur involves lists that can be used for input and those lists can be used to produce text values that are integrated into shell commands. concur is not as focussed on producing varied sets of values to be used in commands. All lists in concur are cycled through with the longest list defining how many operations to perform. If there is a shorter list and its members are fully used the list will cycle back to the starting point.

List of input using the -a flag (which can be used repeatedly to result in separate input lists) are arbitrary literal lists or expansions of file globbing pattters. For example -a '/var/log/*log' will result in a list of paths. One can also supply lists using shell calls. See below for examples..

Auto completion

concur uses the posener/complete library. To use it type COMP_INSTALL=1 concur to be prompted to add auto completion for concur to the end of your .zshrc file. After setup you can restart your terminal session then press tab to get auto completion.

Usage

$ concur -h
concur
------
Commit:       23cffc6
Commit date:  2022-04-24 14:27:48 -0400
Compile Date: 2022-04-25 21:05:57 -0400

Usage: concur [--arguments ARGUMENTS] [--awk AWK] [--dry-run] [--slots SLOTS] [--shuffle] [--ordered] [--keep-order] [--print-empty] [--exit-on-error] [--null] [--ignore-error] [--stdin] [COMMAND]

Positional arguments:
  COMMAND

Options:
  --arguments ARGUMENTS, -a ARGUMENTS
                         lists of arguments
  --awk AWK, -A AWK      process using awk script or a script filename.
  --dry-run, -d          show command to run but don't run
  --slots SLOTS, -s SLOTS
                         number of parallel tasks [default: 8]
  --shuffle, -S          shuffle tasks prior to running
  --ordered, -o          run tasks in their incoming order
  --keep-order, -k       don't keep output for calls separate
  --print-empty, -P      print empty lines
  --exit-on-error, -E    exit on first error
  --null, -0             split at null character
  --ignore-error, -i     Ignore errors
  --stdin, -I            send input to stdin
  --help, -h             display this help and exit```

Split at null is apparently useful if sending in filenames that contain newlines. The null character can then be used on
the recieving side to split by the null character and get the newlines. This is an edge case but was fun to add.

## Examples

If `-ordered` is selected the effect will be to make `-slots` equal to 1, meaning that the runs will be in order.

```sh
$ find . -type f -name "*.yml" | concur 'yamllint -' -ordered -I
stdin
  1:1       warning  missing document start "---"  (document-start)
  1:15      error    no new line character at the end of file  (new-line-at-end-of-file)
stdin
  1:1       warning  missing document start "---"  (document-start)
  1:43      error    no new line character at the end of file  (new-line-at-end-of-file)
stdin
  1:1       warning  missing document start "---"  (document-start)
  1:54      error    no new line character at the end of file  (new-line-at-end-of-file)
stdin
  1:1       warning  missing document start "---"  (document-start)
  1:56      error    no new line character at the end of file  (new-line-at-end-of-file)
stdin
  1:1       warning  missing document start "---"  (document-start)
  1:52      error    no new line character at the end of file  (new-line-at-end-of-file)

Split at null

$ find /var/log -type f -name "*log" -print0 | concur -0
/var/log/fsck_apfs_error.log
/var/log/com.apple.xpc.launchd/launchd.log
/var/log/system.log
/var/log/fsck_apfs.log
/var/log/wifi.log
/var/log/acroUpdaterTools.log
/var/log/shutdown_monitor.log
/var/log/fsck_hfs.log
/var/log/install.log

The first time someone told me how to use find they specified -print0 so I am a bit nostalgic.

$ concur -a "$(seq 5)"
1
2
3
4
5

There is a simple sequence token that can be used as well

$ concur -a '{1..5}'
2
5
3
4
1

concur includes the ability to send the output of either the set of incoming list items or the command run to an awk intepreter (using goawk library).

Note that the order of output is normally the result of parallel excecution and as such is random. This can be overriden.

Tokens

Tokens can be used in the command input. If command input is used the result must be a valid shell call. If no command is supplied the result will be a list of the incoming list values.

Tokens that can be used in the command

  • {} or {1} - list 1 item
  • {.} or {1.} - list 1 item without extension or same with numbered task list item
  • {/} or {1/} - list 1 item basename of input line or same with numbered task list item
  • {//} or {1//} - list 1 item dirname of output line or same with numbered task list item
  • {./} or {1./} - list 1 item bsename of input line without extension or same with numbered task list item
  • {#} sequence number of the job
  • {%} job slot number (based on concurrency)
  • {1..10} - a range - specify in -a and make sure to quote
    • sequences can be used too such as seq 1 10 and '$({1..10})' (shell invocation)
    • multiple sequences can be used and for each -a will be added to a task list

I also have to test out and decide what to do with path and file oriented placeholders like {/} and {2/} where the pattern is not a path or file. Currently the path and file oriented updates occur. It is up to the writer of the call to be careful not to use path and file oriented tokens on non paths or non files.

Optimizations

If only tokens are used in the command string they will be substituted on but no command will be run. For example, concur '{#}' will have {} tokens inserted for each incoming item but that is the extent. It can take very much longer to run a simple echo command on hundreds of thousands of lines (minutes compared to seconds). The substituted command line will be used as the input for any awk script run.

Examples

Run a simple random fibonacci series several times

$ time concur './fibonacci.sh' -a '{1..10}'
13
2584
8
5
610
987
233
3
1
144

Echo a series of numbers

$ concur 'echo {}' -a '{0..9}'
7
0
5
4
2
6
1
3
9
8

If there is no command the output will just be the input. For example

$ concur -a '{0..9}'
0
3
8
9
2
7
5
6
4
1

This will show the sequence numbers and items for a list

$ concur 'echo {#} {}' -a '{0..9}' -o
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
10 9

Note the use of the -o (ordered) flag. In code -ordered forces a single semaphore for running the command against input, resulting in only one command being run at a time.

See below for how to use more than one argument list and numbered tokens to produce output

$ concur 'echo {#} {1} {2}' -a '{0..9}' -a '{10..19}' -o
1 0 10
2 1 11
3 2 12
4 3 13
5 4 14
6 5 15
7 6 16
8 7 17
9 8 18
10 9 19

awk scripts

awk scripts can be run on the output of the initial stage (either the provision of all input fields or the running of a command).

$ concur -A 'BEGIN {FS="\\s+"; OFS=","} {print "got "$1}' -a '{1..10}'
got 2
got 8
got 10
got 1
got 3
got 6
got 4
got 7
got 5
got 9

In this example awk is used as a filter for lines. If the -P is used, empty lines are printed.

ian@ian-macbookair ~/git/concur/cmd/command ‹main●›
cat fruits.txt | concur -A 'BEGIN {FS="\\s+"; OFS=","} /red/ {print $1,$2,$3}'
apple,red,4
strawberry,red,3
raspberry,red,99

Note that empty lines are by default skipped. That can be overriden with a flag

Here is the fruits.text file

name       color  amount
apple      red    4
banana     yellow 6
strawberry red    3
raspberry  red    99
grape      purple 10
apple      green  8
plum       purple 2
kiwi       brown  4
potato     brown  9
pineapple  yellow 5
$ cat fruits.txt | concur 'echo' -A 'BEGIN {FS="\\s+"; OFS=","} /red/ {print $1,$2,$3}' -E


raspberry,red,99
strawberry,red,3


apple,red,4



Here is an ordered version of the previous no blank lines and no filtering

$ cat fruits.txt | concur 'echo' -A 'BEGIN {FS="\\s+"; OFS=","} {print $1,$2,$3}' -o
name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
raspberry,red,99
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5

concur accepts the output of tail -f. awk does as well but goawk does not.

$ tail -f /var/log/*log | concur -A 'BEGIN {FS="\\s+"; OFS=","} /completed/ {print $0}' -o
/dev/rdisk3s3: fsck_apfs completed at Mon Mar  7 14:16:56 2022
/dev/rdisk3s3: fsck_apfs completed at Wed Mar 16 22:00:41 2022
fsck_apfs completed at Wed Mar 16 22:00:41 2022
/dev/rdisk4s2: fsck_hfs completed at Thu Mar 17 21:39:26 2022
/dev/rdisk4s2: fsck_hfs completed at Thu Mar 17 21:39:26 2022
tail -f /var/log/*log | concur -A 'BEGIN {FS="\\s+"; OFS=","} {print $1,$2,$3}'
==>,/var/log/acroUpdaterTools.log,<==
Jan,12,,2022
installer:,Upgrading,at
installer:,The,upgrade
Jan,12,,2022
Jan,12,,2022
Jan,12,,2022
Jan,12,,2022
Jan,12,,2022
Jan,12,,2022
==>,/var/log/fsck_apfs.log,<==
/dev/rdisk3s3:,fsck_apfs,started
/dev/rdisk3s3:,**,QUICKCHECK
/dev/rdisk3s3:,fsck_apfs,completed
/dev/rdisk3s3:,fsck_apfs,started
/dev/rdisk3s3:,**,QUICKCHECK
/dev/rdisk3s3:,fsck_apfs,completed
...
...

Here is an example of using both a standard input list and an additional list with awk

$ cat test/test.txt | concur 'echo {1} {2}' -o -a 'a b c' -A '{FS="\\s+"; OFS=" "} {print $1, $2, $3, $4}' -o
name color amount a
apple red 4 b
banana yellow 6 c
strawberry red 3 a
raspberry red 99 b
grape purple 10 c
apple green 8 a
plum purple 2 b
kiwi brown 4 c
potato brown 9 a
pineapple yellow 5 b

Ping some hosts and waith for full output from each before printing. Notice the use of the -k flag which forces each command's output to be grouped.

concur 'ping -c 1 "{}"' -a '127.0.0.1 ibm.com cisco.com' -keep-order
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.084 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.084/0.084/0.084/0.000 ms
PING ibm.com (104.67.113.240): 56 data bytes
64 bytes from 104.67.113.240: icmp_seq=0 ttl=56 time=29.846 ms

--- ibm.com ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 29.846/29.846/29.846/0.000 ms
PING cisco.com (72.163.4.185): 56 data bytes
64 bytes from 72.163.4.185: icmp_seq=0 ttl=239 time=68.559 ms

--- cisco.com ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 68.559/68.559/68.559/0.000 ms

Escaping command shell commands

The command specified can include calls that will be run by concur against an input. However, the command will be run prior to invocation unless escaped. Examples of characters and sequences that need to be escaped include ` and $(.

$ ls -1 /var/log/*log | concur 'echo count \`wc -l {1}\`'
count 32 /var/log/fsck_apfs_error.log
count 432 /var/log/acroUpdaterTools.log
count 524 /var/log/system.log
count 395 /var/log/wifi.log
count 357 /var/log/fsck_hfs.log
count 39 /var/log/shutdown_monitor.log
count 817 /var/log/fsck_apfs.log
count 140367 /var/log/install.log
$ ls -1 /var/log/*log | concur "echo count \$(wc -l {1})"
count 32 /var/log/fsck_apfs_error.log
count 432 /var/log/acroUpdaterTools.log
count 524 /var/log/system.log
count 395 /var/log/wifi.log
count 357 /var/log/fsck_hfs.log
count 39 /var/log/shutdown_monitor.log
count 817 /var/log/fsck_apfs.log
count 140367 /var/log/install.log

Note that the same result can be obtained without escaping by using single quotes around the command.

$ ls -1 /var/log/*log | concur 'echo count $(wc -l {1})'
count 0 /var/log/fsck_apfs_error.log
count 294 /var/log/system.log
count 432 /var/log/acroUpdaterTools.log
count 432 /var/log/acroUpdaterTools.log
count 294 /var/log/system.log
count 0 /var/log/fsck_apfs_error.log
count 153250 /var/log/install.log
count 153250 /var/log/install.log

Arguments

Lists in arguments need to be quoted. Lists are split up separately.

The command to be run does not need to be quoted unless there are characters like { and `.

e.g. -a "{1..4}" -a "1 2 3 4"

Currently filenames will not result in special handling as files or a source of lines.

Simple sequences are supported

$ concur 'echo "Argument: {}"' -a "{1..4}"
Argument: 1
Argument: 4
Argument: 2
Argument: 3

Argument lists can be specified separated by spaces

$ concur 'echo "Argument: {}"' -a "1 2 3 4"
Argument: 1
Argument: 4
Argument: 2
Argument: 3

Argument lists can include literals and ranges

$ concur 'echo "Argument: {}"' -a '1 2 3 4 5 {6..10}'
Argument: 7
Argument: 2
Argument: 6
Argument: 4
Argument: 5
Argument: 1
Argument: 3
Argument: 8
Argument: 10
Argument: 9

Shell calls can be made to create lists

concur echo "Argument: {1} {2}" -a "{0..9}" -a "$(echo {100..199})"
Argument: 1 100
Argument: 4 100
Argument: 5 100
Argument: 0 100
Argument: 6 100
Argument: 2 100
Argument: 7 100
Argument: 9 100
Argument: 3 100
Argument: 8 100
$ concur 'echo "{1} {2}"' -a "/var/log/*log" -a "$(echo {1..10..2})"
/var/log/wifi.log 5
/var/log/fsck_hfs.log 7
/var/log/shutdown_monitor.log 1
/var/log/fsck_apfs.log 3
/var/log/system.log 3
/var/log/acroUpdaterTools.log 1
/var/log/install.log 9
/var/log/fsck_apfs_error.log 5
$ concur 'echo Slot {%} {1}' -a '/var/log/*log' -slots  2
Slot 1 /var/log/acroUpdaterTools.log
Slot 2 /var/log/fsck_apfs.log
Slot 1 /var/log/fsck_apfs_error.log
Slot 2 /var/log/fsck_hfs.log
Slot 1 /var/log/install.log
Slot 2 /var/log/shutdown_monitor.log
Slot 1 /var/log/system.log
Slot 2 /var/log/wifi.log

Benchmarks

Initial benchmarks are encouraging, though parallel is written in Perl and does all kinds of cool things.

$ time parallel echo "Argument: {}" ::: 1 2 3 4 5 {6..10}
Argument: 1
Argument: 4
Argument: 2
Argument: 6
Argument: 5
Argument: 3
Argument: 8
Argument: 7
Argument: 9
Argument: 10

parallel echo "Argument: {}" ::: 1 2 3 4 5 {6..10}  0.33s user 0.19s system 241% cpu 0.216 total
$ time concur 'echo Argument: {}' -a '1 2 3 4 5 {6..10}'
Argument: 8
Argument: 1
Argument: 4
Argument: 5
Argument: 2
Argument: 6
Argument: 3
Argument: 7
Argument: 9
Argument: 10

concur 'echo Argument: {}' -a '1 2 3 4 5 {6..10}'  0.02s user 0.04s system 218% cpu 0.025 total

Trivia

In keeping with my recent trend when writing utilities, there are about 1,000 lines of golang code. I have moved towards having a package contain about 400 lines of code with more allowed if the package is doing one thing such as implementing handler functions. 1,000 lines of code to define data types and variables and functions to use all of that is not as readable.

$ gocloc Taskfile.yml README.md cmd
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Go                               9            234            227           1149
Markdown                         1             99              0            486
YAML                             1              3              3             53
-------------------------------------------------------------------------------
TOTAL                           11            336            230           1688
-------------------------------------------------------------------------------

About

A parallel/xargs like tool. Executes commands using sets of input.

Resources

License

Stars

Watchers

Forks

Packages

No packages published