A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.
When we set out to build Wallaroo, we had several high-level goals in mind:
- Create a dependable and resilient distributed computing framework
- Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic
- Provide high-performance & low-latency data processing
- Be portable and deploy easily (i.e., run on-prem or any cloud)
- Manage in-memory state for the application
- Allow applications to scale as needed, even when they are live and up-and-running
You can learn more about Wallaroo from our "Hello Wallaroo!" blog post and the Wallaroo overview video.
Wallaroo is a little different than most stream processing tools. While most require the JVM, Wallaroo can be deployed as a separate binary. This means no more jar files. Wallaroo also isn't locked to just using Kafka as a source, use any source you like. Application logic can be written in Python 2, Python 3, or Pony.
Wallaroo can either be installed via Docker, Vagrant or (on Linux) via our handy Wallaroo Up command.
As easy as:
docker pull wallaroo-labs-docker-wallaroolabs.bintray.io/release/wallaroo:latest
Check out our installation options page to learn more.
Once you've installed Wallaroo, Take a look at some of our examples. A great place to start are our word_count or market spread examples in Python.
"""
This is a complete example application that receives lines of text and counts each word.
"""
import string
import struct
import wallaroo
def application_setup(args):
in_host, in_port = wallaroo.tcp_parse_input_addrs(args)[0]
out_host, out_port = wallaroo.tcp_parse_output_addrs(args)[0]
lines = wallaroo.source("Split and Count",
wallaroo.TCPSourceConfig(in_host, in_port,
decode_line))
pipeline = (lines
.to(split)
.key_by(extract_word)
.to(count_word)
.to_sink(wallaroo.TCPSinkConfig(out_host, out_port,
encode_word_count)))
return wallaroo.build_application("Word Count Application", pipeline)
@wallaroo.computation_multi(name="split into words")
def split(data):
punctuation = " !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~"
words = []
for line in data.split("\n"):
clean_line = line.lower().strip(punctuation)
for word in clean_line.split(" "):
clean_word = word.strip(punctuation)
words.append(clean_word)
return words
class WordTotal(object):
count = 0
@wallaroo.state_computation(name="count word", state=WordTotal)
def count_word(word, word_total):
word_total.count = word_total.count + 1
return WordCount(word, word_total.count)
class WordCount(object):
def __init__(self, word, count):
self.word = word
self.count = count
@wallaroo.key_extractor
def extract_word(word):
return word
@wallaroo.decoder(header_length=4, length_fmt=">I")
def decode_line(bs):
return bs.decode("utf-8")
@wallaroo.encoder
def encode_word_count(word_count):
output = word_count.word + " => " + str(word_count.count) + "\n"
return output.encode("utf-8")
Are you the sort who just wants to get going? Dive right into our documentation then! It will get you up and running with Wallaroo.
More information is also on our blog. There you can find more insight into what we are working on and industry use-cases.
Wallaroo currently exists as a mono-repo. All the source that is Wallaroo is located in this repo. See application structure for more information.
Trying to figure out how to get started?
-
Check out the FAQ
-
Drop us a line:
We welcome contributions. Please see our Contribution Guide
For your pull request to be accepted you will need to accept our Contributor License Agreement
Wallaroo is licensed under the Apache version 2 license.