Skip to content
/ sop Public

A command line tool to perform set operations with files

License

Notifications You must be signed in to change notification settings

marcosy/sop

Repository files navigation

sop

Build coverage Go Report Card License

A command line tool to perform set operations with files

Installation

You can install sop using homebrew, go install or building it from source.

Homebrew

> brew install marcosy/tap/sop

Go Install

> go install github.com/marcosy/sop

Remember to add GOPATH/bin to your PATH:

> export PATH="$GOPATH/bin:$PATH"

Build from source

You can also build sop from source, just run:

> git clone git@github.com:marcosy/sop.git
> make build

The binary will be saved at ./bin/sop. For other targets, run make help.

Usage

sop considers files as sets of elements and performs set operations with those files.

sop [options] <operation> <filepath A> <filepath B>
  • operation can be one of:

    • union: Print elements that exists in file A or file B

    • intersection: Print elements that exists in file A and file B

    • difference: Print elements that exists in file A and do not exist in file B

  • filepath A and filepath B are the filepaths to the files containing the elements to operate with. Elements are delimited by a separator string which by default is "\n".

  • options can be:

    • -s: String used as element separator (default "\n")

Examples

Given two files A (fileA.txt) and B (fileB.txt):

fileA.txt:

Fox
Duck
Dog
Cat

fileB.txt:

Dog
Cat
Cow
Goat

sop performs set operations with the files.

Operations

The available operations are: union, intersection and difference.

Union

The union of two sets A and B is the set of elements which are in A, in B, or in both A and B.

> sop union fileA.txt fileB.txt
Fox
Duck
Dog
Cat
Cow
Goat

Intersection

The intersection of two sets A and B is the set containing all elements of A that also belong to B or equivalently, all elements of B that also belong to A.

> sop intersection fileA.txt fileB.txt
Dog
Cat

Difference

The difference (a.k.a. relative complement) of A and B, is the set of all elements in A that are not in B.

> sop difference fileA.txt fileB.txt
Fox
Duck

Considerations

Separator

The separator character used to delimitate elements is set by default to the new line character (\n) but can also be configured using the flag -s:

> sop -s , union fileA.csv fileB.csv 

Sorting

The result sets are not ordered by default, so consecutive executions may return elements in different order. To obtain a consistent order pipe the output of sop to sort:

> sop intersection fileA.txt fileB.txt | sort