Ops4J is proudly developed with the from the fine folks at ej-technologies.
PLEASE NOTE THAT THIS IS A EARLY WORK IN PROGRESS AND THAT I AM STILL WORKING ON BASIC APPLICATON ARCHITECTURE. THE CODE WILL REMAIN RATHER FLUID FOR SOME TIME TO COME.
Ops4j is a framework which is well suited for rapid prototyping, experimentation and innovation. Developers write small units of code we refer to as operations
.
Operations are:
- singular of purpose
- self-documenting with command line help and descriptive names such as
mongo:insert
orbenchmark
. - Flexible and reusable
- Interoperable with other operations
- Interoperable with external programs such as command line utilities
Operations adhere to a JSON IN / JSON OUT paradigm which allows them to be orchestrated directly from the shell. Operations provide their own documentation in the form of online help and can be used as would any other CLI based utility.
Say we wish to scrub JSON content stored in a file named input.json
; removing blanks and nulls, then insert the record into a collection called scrubbed
in a database named test
. In ops4j this is a simple one liner:
cat input.json | remove -blanks -nulls | mongo:insert -d test -c scrubbed -O NONE
We validate the data by streaming it back from Mongo as follows:
mongo:stream -d test -c scrubbed
We can drop a benchmark in anywhere we like. Here we test our ETL process performance.
cat input.json | remove -blanks -nulls | mongo:insert -d test -c scrubbed | benchmark -O NONE
We can run things concurrently. A variety of concurrent operations are available. Here we use the parallel operation to execute our data scrub and load in parallel. The benchmark running in a single thread.
cat input.json | parallel -t 4 'remove -blanks -nulls | mongo:insert -d test -c scrubbed' | benchmark -O NONE
Small changes in pipeline structure can result in significant changes in pipeline architecture. Here, we move the benchmark into the parallel pipeline so that we can check for things such as starvation. We will receive separate benchmarks for each thread.
cat input.json | parallel -t 4 'remove -blanks -nulls | mongo:insert -d test -c scrubbed | benchmark' -O NONE
The previous example could also be coded directly in the JVM.
JsonNodeInputStream it = JsonNodeInputStream.from(new FileInputStream("input.json"));
Pipeline p = new Pipeline()
.add(new RemoveJson().blanks(true).nulls(true))
.add(new MongoInsert().db("test").collection("scrubbed"))
.add(new Benchmark())
.initialize()
.open();
while (it.hasNext())
{
List<JsonNode> results = p.execute(it.next());
}
p.close().cleanup();
We can construct pipelines in a number of ways.
Using the pipeline DSL:
Pipeline p = Pipeline.from("remove -nulls -blanks | " +
"mongo:insert -d test -c scrubbed | benchmark");
Loaded from repository:
Pipeline p = Ops4J.repo().load("scrub");