Skip to content

Latest commit

 

History

History
48 lines (38 loc) · 1.75 KB

Usage.md

File metadata and controls

48 lines (38 loc) · 1.75 KB

Usage

The main data structure for representing elements of a data stream is the Tuple class. Tuple<> represents a template class which can parametrized with the attribute types of the element. Note, that timestamps are not represented separately: timestamps can be derived from any attribute (or a combination of attributes). Furthermore, tuples are not copied around but only passed by reference. For this purpose, usually the TuplePtr<> template is used instead of Tuple which simply wraps a tuple with an intrusive pointer. Thus, a complete schema definition for a stream looks like the following:

typedef TuplePtr<int, std::string, double> T1;

Tuples can be constructed using the makeTuplePtr function which of course requires correctly typed parameters:

auto tp = makeTuplePtr(1, std::string("a string"), 2.0);

For accessing the individual components of an attribute we provide the get<> template function where the template parameter is the position of the attribute in the tuple. In order to access the string component of tuple tp the following code an be used:

auto s = get<1>(tp);

The recommended interface for implementing stream processing pipelines is the Topology class which allows to specify processing steps in a DSL very similar to Apache Spark. The following code snippet gives an example. See below for an explanation of the provided operators.

typedef TuplePtr<int, std::string, double> T1;
typedef TuplePtr<double, int> T2;

Topology t;
auto s = t.newStreamFromFile("file.csv")
  .extract<T1>(',')
  .where([](auto tp, bool outdated) { return get<0>(tp) % 2 == 0; } )
  .map<T2>([](auto tp, bool) -> T2 {
    return makeTuplePtr(get<2>(tp), get<0>(tp));
  })
  .print(std::cout);

t.start();