Usage

The main data structure for representing elements of a data stream is the Tuple class. Tuple<> represents a template class which can parametrized with the attribute types of the element. Note, that timestamps are not represented separately: timestamps can be derived from any attribute (or a combination of attributes). Furthermore, tuples are not copied around but only passed by reference. For this purpose, usually the TuplePtr<> template is used instead of Tuple which simply wraps a tuple with an intrusive pointer. Thus, a complete schema definition for a stream looks like the following:

typedef TuplePtr<int, std::string, double> T1;

Tuples can be constructed using the makeTuplePtr function which of course requires correctly typed parameters:

auto tp = makeTuplePtr(1, std::string("a string"), 2.0);

For accessing the individual components of an attribute we provide the get<> template function where the template parameter is the position of the attribute in the tuple. In order to access the string component of tuple tp the following code an be used:

auto s = get<1>(tp);

The recommended interface for implementing stream processing pipelines is the Topology class which allows to specify processing steps in a DSL very similar to Apache Spark. The following code snippet gives an example. See below for an explanation of the provided operators.

typedef TuplePtr<int, std::string, double> T1;
typedef TuplePtr<double, int> T2;

Topology t;
auto s = t.newStreamFromFile("file.csv")
  .extract<T1>(',')
  .where([](auto tp, bool outdated) { return get<0>(tp) % 2 == 0; } )
  .map<T2>([](auto tp, bool) -> T2 {
    return makeTuplePtr(get<2>(tp), get<0>(tp));
  })
  .print(std::cout);

t.start();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage.md

Usage.md

Usage

Files

Usage.md

Latest commit

History

Usage.md

File metadata and controls

Usage