You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated API documentation for high and low level APIs.
vjagadish1989 nickpan47 Please take a look.
Author: Prateek Maheshwari <pmaheshwari@apache.org>
Reviewers: Jagadish<jagadish@apache.org>
Closesapache#802 from prateekm/api-docs
Copy file name to clipboardExpand all lines: docs/learn/documentation/versioned/api/programming-model.md
+62-56Lines changed: 62 additions & 56 deletions
Original file line number
Diff line number
Diff line change
@@ -18,96 +18,103 @@ title: Programming Model
18
18
See the License for the specific language governing permissions and
19
19
limitations under the License.
20
20
-->
21
-
# Introduction
22
-
Samza provides different sets of programming APIs to meet requirements from different sets of users. The APIs are listed below:
21
+
###Introduction
22
+
Samza provides multiple programming APIs to fit your use case:
23
23
24
-
1. Java programming APIs: Samza provides Java programming APIs for users who are familiar with imperative programming languages. The overall programming model to create a Samza application in Java will be described here. Samza also provides two sets of APIs to describe user processing logic:
25
-
1.[High-level API](high-level-api.md): this API allows users to describe the end-to-end stream processing pipeline in a connected DAG (Directional Acyclic Graph). It also provides a rich set of build-in operators to help users implementing common transformation logic, such as filter, map, join, and window.
26
-
2.[Task API](low-level-api.md): this is low-level Java API which provides “bare-metal” programming interfaces to the users. Task API allows users to explicitly access physical implementation details in the system, such as accessing the physical system stream partition of an incoming message and explicitly controlling the thread pool to execute asynchronous processing method.
27
-
2.[Samza SQL](samza-sql.md): Samza provides SQL for users who are familiar with declarative query languages, which allows the users to focus on data manipulation via SQL predicates and UDFs, not the physical implementation details.
28
-
3. Beam API: Samza also provides a [Beam runner](https://beam.apache.org/documentation/runners/capability-matrix/) to run applications written in Beam API. This is considered as an extension to existing operators supported by the high-level API in Samza.
24
+
1. Java APIs: Samza's provides two Java programming APIs that are ideal for building advanced Stream Processing applications.
25
+
1.[High Level Streams API](high-level-api.md): Samza's flexible High Level Streams API lets you describe your complex stream processing pipeline in the form of a Directional Acyclic Graph (DAG) of operations on message streams. It provides a rich set of built-in operators that simplify common stream processing operations such as filtering, projection, repartitioning, joins, and windows.
26
+
2.[Low Level Task API](low-level-api.md): Samza's powerful Low Level Task API lets you write your application in terms of processing logic for each incoming message.
27
+
2.[Samza SQL](samza-sql.md): Samza SQL provides a declarative query language for describing your stream processing logic. It lets you manipulate streams using SQL predicates and UDFs instead of working with the physical implementation details.
28
+
3.Apache Beam API: Samza also provides a [Apache Beam runner](https://beam.apache.org/documentation/runners/capability-matrix/) to run applications written using the Apache Beam API. This is considered as an extension to the operators supported by the High Level Streams API in Samza.
29
29
30
-
The following sections will be focused on Java programming APIs.
31
-
32
-
# Key Concepts for a Samza Java Application
33
-
To write a Samza Java application, you will typically follow the steps below:
34
-
1. Define your input and output streams and tables
35
-
2. Define your main processing logic
36
30
31
+
### Key Concepts
37
32
The following sections will talk about key concepts in writing your Samza applications in Java.
38
33
39
-
## Samza Applications
40
-
When writing your stream processing application using Java API in Samza, you implement either a [StreamApplication](javadocs/org/apache/samza/application/StreamApplication.html) or [TaskApplication](javadocs/org/apache/samza/application/TaskApplication.html) and define your processing logic in the describe method.
41
-
- For StreamApplication:
34
+
#### Samza Applications
35
+
A [SamzaApplication](javadocs/org/apache/samza/application/SamzaApplication.html) describes the inputs, outputs, state, configuration and the logic for processing data from one or more streaming sources.
36
+
37
+
You can implement a
38
+
[StreamApplication](javadocs/org/apache/samza/application/StreamApplication.html) and use the provided [StreamApplicationDescriptor](javadocs/org/apache/samza/application/descriptors/StreamApplicationDescriptor) to describe the processing logic using Samza's High Level Streams API in terms of [MessageStream](javadocs/org/apache/samza/operators/MessageStream.html) operators.
42
39
43
40
{% highlight java %}
44
-
45
-
public void describe(StreamApplicationDescriptor appDesc) { … }
41
+
42
+
public class MyStreamApplication implements StreamApplication {
43
+
@Override
44
+
public void describe(StreamApplicationDescriptor appDesc) {
45
+
// Describe your application here
46
+
}
47
+
}
46
48
47
49
{% endhighlight %}
50
+
51
+
Alternatively, you can implement a [TaskApplication](javadocs/org/apache/samza/application/TaskApplication.html) and use the provided [TaskApplicationDescriptor](javadocs/org/apache/samza/application/descriptors/TaskApplicationDescriptor) to describe it using Samza's Low Level API in terms of per-message processing logic.
52
+
53
+
48
54
- For TaskApplication:
49
55
50
56
{% highlight java %}
51
57
52
-
public void describe(TaskApplicationDescriptor appDesc) { … }
58
+
public class MyTaskApplication implements TaskApplication {
59
+
@Override
60
+
public void describe(TaskApplicationDescriptor appDesc) {
61
+
// Describe your application here
62
+
}
63
+
}
53
64
54
65
{% endhighlight %}
55
66
56
-
## Descriptors for Data Streams and Tables
57
-
There are three different types of descriptors in Samza: [InputDescriptor](javadocs/org/apache/samza/system/descriptors/InputDescriptor.html), [OutputDescriptor](javadocs/org/apache/samza/system/descriptors/OutputDescriptor.html), and [TableDescriptor](javadocs/org/apache/samza/table/descriptors/TableDescriptor.html). The InputDescriptor and OutputDescriptor are used to describe the physical sources and destinations of a stream, while a TableDescriptor is used to describe the physical dataset and IO functions for a table.
58
-
Usually, you will obtain InputDescriptor and OutputDescriptor from a [SystemDescriptor](javadocs/org/apache/samza/system/descriptors/SystemDescriptor.html), which include all information about producer and consumers to a physical system. The following code snippet illustrate how you will obtain InputDescriptor and OutputDescriptor from a SystemDescriptor.
59
67
60
-
{% highlight java %}
61
-
62
-
public class BadPageViewFilter implements StreamApplication {
63
-
@Override
64
-
public void describe(StreamApplicationDescriptor appDesc) {
65
-
KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
66
-
InputDescriptor<PageView> pageViewInput = kafka.getInputDescriptor(“page-views”, new JsonSerdeV2<>(PageView.class));
67
-
OutputDescriptor<DecoratedPageView> pageViewOutput = kafka.getOutputDescriptor(“decorated-page-views”, new JsonSerdeV2<>(DecoratedPageView.class));
68
+
#### Streams and Table Descriptors
69
+
Descriptors let you specify the properties of various aspects of your application from within it.
68
70
69
-
// Now, implement your main processing logic
70
-
}
71
-
}
72
-
73
-
{% endhighlight %}
71
+
[InputDescriptor](javadocs/org/apache/samza/system/descriptors/InputDescriptor.html)s and [OutputDescriptor](javadocs/org/apache/samza/system/descriptors/OutputDescriptor.html)s can be used for specifying Samza and implementation-specific properties of the streaming inputs and outputs for your application. You can obtain InputDescriptors and OutputDescriptors using a [SystemDescriptor](javadocs/org/apache/samza/system/descriptors/SystemDescriptor.html) for your system. This SystemDescriptor can be used for specify Samza and implementation-specific properties of the producer and consumers for your I/O system. Most Samza system implementations come with their own SystemDescriptors, but if one isn't available, you
72
+
can use the [GenericSystemDescriptor](javadocs/org/apache/samza/system/descriptors/GenericSystemDescriptor.html).
74
73
75
-
You can also add a TableDescriptor to your application.
74
+
A [TableDescriptor](javadocs/org/apache/samza/table/descriptors/TableDescriptor.html) can be used for specifying Samza and implementation-specific properties of a [Table](javadocs/org/apache/samza/table/Table.html). You can use a Local TableDescriptor (e.g. [RocksDbTableDescriptor](javadocs/org/apache/samza/storage/kv/descriptors/RocksDbTableDescriptor.html) or a [RemoteTableDescriptor](javadocs/org/apache/samza/table/descriptors/RemoteTableDescriptor).
75
+
76
+
77
+
The following example illustrates how you can use input and output descriptors for a Kafka system, and a table descriptor for a local RocksDB table within your application:
76
78
77
79
{% highlight java %}
78
-
79
-
public class BadPageViewFilter implements StreamApplication {
80
+
81
+
public class MyStreamApplication implements StreamApplication {
80
82
@Override
81
-
public void describe(StreamApplicationDescriptor appDesc) {
82
-
KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
83
-
InputDescriptor<PageView> pageViewInput = kafka.getInputDescriptor(“page-views”, new JsonSerdeV2<>(PageView.class));
84
-
OutputDescriptor<DecoratedPageView> pageViewOutput = kafka.getOutputDescriptor(“decorated-page-views”, new JsonSerdeV2<>(DecoratedPageView.class));
85
-
TableDescriptor<String, Integer> viewCountTable = new RocksDBTableDescriptor(
86
-
“pageViewCountTable”, KVSerde.of(new StringSerde(), new IntegerSerde()));
87
-
88
-
// Now, implement your main processing logic
83
+
public void describe(StreamApplicationDescriptor appDescriptor) {
84
+
KafkaSystemDescriptor ksd = new KafkaSystemDescriptor("kafka")
ksd.getInputDescriptor(“page-views”, new JsonSerdeV2<>(PageView.class));
89
+
KafkaOutputDescriptor<DecoratedPageView> kod =
90
+
ksd.getOutputDescriptor(“decorated-page-views”, new JsonSerdeV2<>(DecoratedPageView.class));
91
+
92
+
RocksDbTableDescriptor<String, Integer> td =
93
+
new RocksDbTableDescriptor(“viewCounts”, KVSerde.of(new StringSerde(), new IntegerSerde()));
94
+
95
+
// Implement your processing logic here
89
96
}
90
97
}
91
98
92
99
{% endhighlight %}
93
100
94
101
The same code in the above describe method applies to TaskApplication as well.
95
102
96
-
## Stream Processing Logic
103
+
####Stream Processing Logic
97
104
98
-
Samza provides two sets of APIs to define the main stream processing logic, high-level API and Task API, via StreamApplication and TaskApplication, respectively.
105
+
Samza provides two sets of APIs to define the main stream processing logic, High Level Streams API and Low Level Task API, via StreamApplication and TaskApplication, respectively.
99
106
100
-
High-level API allows you to describe the processing logic in a connected DAG of transformation operators, like the example below:
107
+
High Level Streams API allows you to describe the processing logic in a connected DAG of transformation operators, like the example below:
101
108
102
109
{% highlight java %}
103
110
104
111
public class BadPageViewFilter implements StreamApplication {
105
112
@Override
106
113
public void describe(StreamApplicationDescriptor appDesc) {
107
-
KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
114
+
KafkaSystemDescriptor ksd = new KafkaSystemDescriptor();
108
115
InputDescriptor<PageView> pageViewInput = kafka.getInputDescriptor(“page-views”, new JsonSerdeV2<>(PageView.class));
109
116
OutputDescriptor<DecoratedPageView> pageViewOutput = kafka.getOutputDescriptor(“decorated-page-views”, new JsonSerdeV2<>(DecoratedPageView.class));
110
-
TableDescriptor<String, Integer> viewCountTable = new RocksDBTableDescriptor(
117
+
RocksDbTableDescriptor<String, Integer> viewCountTable = new RocksDbTableDescriptor(
111
118
“pageViewCountTable”, KVSerde.of(new StringSerde(), new IntegerSerde()));
112
119
113
120
// Now, implement your main processing logic
@@ -120,7 +127,7 @@ High-level API allows you to describe the processing logic in a connected DAG of
120
127
121
128
{% endhighlight %}
122
129
123
-
Task API allows you to describe the processing logic in a customized StreamTaskFactory or AsyncStreamTaskFactory, like the example below:
130
+
Low Level Task API allows you to describe the processing logic in a customized StreamTaskFactory or AsyncStreamTaskFactory, like the example below:
124
131
125
132
{% highlight java %}
126
133
@@ -130,7 +137,7 @@ Task API allows you to describe the processing logic in a customized StreamTaskF
130
137
KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
131
138
InputDescriptor<PageView> pageViewInput = kafka.getInputDescriptor(“page-views”, new JsonSerdeV2<>(PageView.class));
132
139
OutputDescriptor<DecoratedPageView> pageViewOutput = kafka.getOutputDescriptor(“decorated-page-views”, new JsonSerdeV2<>(DecoratedPageView.class));
133
-
TableDescriptor<String, Integer> viewCountTable = new RocksDBTableDescriptor(
140
+
RocksDbTableDescriptor<String, Integer> viewCountTable = new RocksDbTableDescriptor(
134
141
“pageViewCountTable”, KVSerde.of(new StringSerde(), new IntegerSerde()));
135
142
136
143
// Now, implement your main processing logic
@@ -142,11 +149,10 @@ Task API allows you to describe the processing logic in a customized StreamTaskF
142
149
143
150
{% endhighlight %}
144
151
145
-
Details for [high-level API](high-level-api.md) and [Task API](low-level-api.md) are explained later.
152
+
#### Configuration for a Samza Application
146
153
147
-
## Configuration for a Samza Application
154
+
To deploy a Samza application, you need to specify the implementation class for your application and the ApplicationRunner to launch your application. The following is an incomplete example of minimum required configuration to set up the Samza application and the runner. For additional configuration, see the Configuration Reference.
148
155
149
-
To deploy a Samza application, you will need to specify the implementation class for your application and the ApplicationRunner to launch your application. The following is an incomplete example of minimum required configuration to set up the Samza application and the runner:
150
156
{% highlight jproperties %}
151
157
152
158
# This is the class implementing StreamApplication
Copy file name to clipboardExpand all lines: docs/learn/documentation/versioned/api/samza-sql.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -87,7 +87,7 @@ Note: Samza sql console right now doesn’t support queries that need state, for
87
87
88
88
89
89
# Running Samza SQL on YARN
90
-
The [hello-samza](https://github.com/apache/samza-hello-samza) project is an example project designed to help you run your first Samza application. It has examples of applications using the low level task API, high level API as well as Samza SQL.
90
+
The [hello-samza](https://github.com/apache/samza-hello-samza) project is an example project designed to help you run your first Samza application. It has examples of applications using the Low Level Task API, High Level Streams API as well as Samza SQL.
91
91
92
92
This tutorial demonstrates a simple Samza application that uses SQL to perform stream processing.
provides a unified abstraction for Samza applications to access any remote data
276
-
store through stream-table join in high-level API or direct access in low-level
277
-
API. Remote Table is a store-agnostic abstraction that can be customized to
276
+
store through stream-table join in High Level Streams API or direct access in Low Level Task API. Remote Table is a store-agnostic abstraction that can be customized to
278
277
access new types of stores by writing pluggable I/O "Read/Write" functions,
279
278
implementations of
280
279
[`TableReadFunction`](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/table/remote/TableReadFunction.java) and
@@ -283,7 +282,7 @@ interfaces. Remote Table also provides common functionality, eg. rate limiting
283
282
(built-in) and caching (hybrid).
284
283
285
284
The async APIs in Remote Table are recommended over the sync versions for higher
286
-
throughput. They can be used with Samza with low-level API to achieve the maximum
285
+
throughput. They can be used with Samza with Low Level Task API to achieve the maximum
287
286
throughput.
288
287
289
288
Remote Tables are represented by class
@@ -420,7 +419,7 @@ created during instantiation of Samza container.
420
419
The life of a table goes through a few phases
421
420
422
421
1.**Declaration** - at first one declares the table by creating a `TableDescriptor`. In both
423
-
Samza high level and low level API, the `TableDescriptor` is registered with stream
422
+
Samza High Level Streams API and Low Level Task API, the `TableDescriptor` is registered with stream
424
423
graph, internally converted to `TableSpec` and in return a reference to a `Table`
425
424
object is obtained that can participate in the building of the DAG.
426
425
2.**Instantiation** - during planning stage, configuration is
@@ -436,7 +435,7 @@ The life of a table goes through a few phases
436
435
* In Samza high level API, all table instances can be retrieved from `TaskContext` using
0 commit comments