Skip to content

Commit 2437f64

Browse files
author
Donald Tregonning
authored
Merge pull request #132 from splunk/issue117-rename-directories
Updated naming throughout project per issue #117
2 parents e24583a + e9d2944 commit 2437f64

File tree

6 files changed

+50
-50
lines changed

6 files changed

+50
-50
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ hs_err_pid*
2323
target/*
2424
.idea/*
2525

26-
kafka-connect-splunk/
26+
splunk-kafka-connect/
2727
pom.xml.versionsBackup
2828
.classpath
2929
.project

README.md

Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
## Kafka Connect Splunk
1+
## Splunk Connect for Kafka
22

3-
A Kafka Connect Sink for Splunk features:
3+
Splunk Connect for Kafka is a Kafka Connect Sink for Splunk with the following features:
44

55
* Data ingestion from Kafka topics into Splunk via [Splunk HTTP Event Collector(HEC)](http://dev.splunk.com/view/event-collector/SP-CAAAE6M).
66
* In-flight data transformation and enrichment.
@@ -19,16 +19,16 @@ A Kafka Connect Sink for Splunk features:
1919

2020
1. Clone the repo from https://github.com/splunk/kafka-connect-splunk
2121
2. Verify that Java8 JRE or JDK is installed.
22-
3. Run `bash build.sh`. The build script will download all dependencies and build the Splunk Kafka Connector.
22+
3. Run `bash build.sh`. The build script will download all dependencies and build Splunk Connect for Kafka.
2323

24-
Note: The resulting "kafka-connect-splunk-*.tar.gz" package is self-contained. Bundled within it are the Kafka Connect framework, all 3rd party libraries, and the Splunk Kafka Connector.
24+
Note: The resulting "splunk-kafka-connect*.tar.gz" package is self-contained. Bundled within it are the Kafka Connect framework, all 3rd party libraries, and Splunk Connect for Kafka.
2525

2626
## Quick Start
2727

2828
1. [Start](https://kafka.apache.org/quickstart) your Kafka Cluster and confirm it is running.
2929
2. If this is a new install, create a test topic (eg: `perf`). Inject events into the topic. This can be done using [Kafka data-gen-app](https://github.com/dtregonning/kafka-data-gen) or the Kafka bundle [kafka-console-producer](https://kafka.apache.org/quickstart#quickstart_send).
30-
3. Untar the package created from the build script: `tar xzvf kafka-connect-splunk-*.tar.gz` (Default target location is /tmp/kafka-connect-splunk-build/kafka-connect-splunk).
31-
4. Navigate to kafka-connect-splunk directory `cd kafka-connect-splunk`.
30+
3. Untar the package created from the build script: `tar xzvf splunk-kafka-connect-*.tar.gz` (Default target location is /tmp/splunk-kafka-connect-build/kafka-connect-splunk).
31+
4. Navigate to splunk-kafka-connect directory `cd splunk-kafka-connect`.
3232
5. Adjust values for `bootstrap.servers` and `plugin.path` inside `config/connect-distributed-quickstart.properties` to fit your environment. Default values should work for experimentation.
3333
6. Run `./bin/connect-distributed.sh config/connect-distributed-quickstart.properties` to start Kafka Connect.
3434
7. Run the following command to create connector tasks. Adjust `topics` to set the topic, and `splunk.hec.token` to set your HEC token.
@@ -88,21 +88,21 @@ Note: The resulting "kafka-connect-splunk-*.tar.gz" package is self-contained. B
8888
8989
9090
## Deployment
91-
Splunk Kafka Connector can run in containers, virtual machines or on physical machines.
91+
Splunk Connect for Kafka can run in containers, virtual machines or on physical machines.
9292
You can leverage any automation tools for deployment.
9393
9494
Use the following connector deployment options:
95-
* Splunk Kafka Connector in a dedicated Kafka Connect Cluster (recommended)
96-
* Splunk Kafka Connector in an existing Kafka Connect Cluster
95+
* Splunk Connect for Kafka in a dedicated Kafka Connect Cluster (recommended)
96+
* Splunk Connect for Kafka in an existing Kafka Connect Cluster
9797
9898
### Connector in a dedicated Kafka Connect Cluster
99-
Running the Splunk Kafka Connector in a dedicated Kafka Connect Cluster is recommended. Isolating the Splunk connector from other Kafka connectors results in significant performance benefits in high throughput environments.
99+
Running Splunk Connect for Kafka in a dedicated Kafka Connect Cluster is recommended. Isolating the Splunk connector from other Kafka connectors results in significant performance benefits in high throughput environments.
100100
101-
1. Untar the **kafka-connect-splunk-*.tar.gz** package and navigate to the **kafka-connect-splunk** directory.
101+
1. Untar the **splunk-kafka-connect-*.tar.gz** package and navigate to the **splunk-kafka-connect** directory.
102102
103103
```
104-
tar xzvf kafka-connect-splunk-*.tar.gz
105-
cd kafka-connect-splunk
104+
tar xzvf splunk-kafka-connect-*.tar.gz
105+
cd splunk-kafka-connect
106106
```
107107
108108
2. Update config/connect-distributed.properties to match your environment.
@@ -132,7 +132,7 @@ Running the Splunk Kafka Connector in a dedicated Kafka Connect Cluster is recom
132132
status.storage.partitions=5
133133
```
134134
135-
4. Deploy/Copy the **kafka-connect-splunk** directory to all target hosts (virtual machines, physical machines or containers).
135+
4. Deploy/Copy the **splunk-kafka-connect** directory to all target hosts (virtual machines, physical machines or containers).
136136
5. Start Kafka Connect on all target hosts using the below commands:
137137
138138
```
@@ -144,7 +144,7 @@ Running the Splunk Kafka Connector in a dedicated Kafka Connect Cluster is recom
144144
145145
### Connector in an existing Kafka Connect Cluster
146146
147-
1. Navigate to Splunkbase and download the latest version of [Splunk Kafka Connect](https://splunkbase.splunk.com/app/3862/).
147+
1. Navigate to Splunkbase and download the latest version of [Splunk Connect for Kafka](https://splunkbase.splunk.com/app/3862/).
148148
149149
2. Copy downloaded file onto every host running Kafka Connect into the directory that contains your other connectors or create a folder to store them in. (ex. `/opt/connectors/splunk-kafka-connect`)
150150
@@ -189,7 +189,7 @@ Please create or modify a Kafka Connect worker properties file to contain these
189189
5. Validate your connector deployment by running the following command curl `http://<KAFKA_CONNECT_HOST>:8083/connector-plugins`. Response should have an entry named `com.splunk.kafka.connect.SplunkSinkConnector`.
190190
191191
## Security
192-
The Kafka Connect Splunk Sink supports the following security mechanisms:
192+
Splunk Connect for Kafka supports the following security mechanisms:
193193
* `SSL`
194194
* `SASL/GSSAPI (Kerberos)` - starting at version 0.9.0.0
195195
* `SASL/PLAIN` - starting at version 0.10.0.0
@@ -367,7 +367,7 @@ After Kafka Connect is brought up on every host, all of the Kafka Connect instan
367367
Even in a load balanced environment, a REST call can be executed against one of the cluster instances, and rest of the instances will pick up the task automatically.
368368
369369
### Configuration schema structure
370-
Use the below schema to configure Splunk Kafka Connector
370+
Use the below schema to configure Splunk Connect for Kafka
371371
372372
```
373373
{
@@ -406,7 +406,7 @@ Use the below schema to configure Splunk Kafka Connector
406406
407407
* `name` - Connector name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes.
408408
* `connector.class` - The Java class used to perform connector jobs. Keep the default value **com.splunk.kafka.connect.SplunkSinkConnector** unless you modify the connector.
409-
* `tasks.max` - The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Splunk Kafka Connector nodes.
409+
* `tasks.max` - The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Splunk Connect for Kafka nodes.
410410
* `splunk.hec.uri` - Splunk HEC URIs. Either a list of FQDNs or IPs of all Splunk indexers, separated with a ",", or a load balancer. The connector will load balance to indexers using round robin. Splunk Connector will round robin to this list of indexers.
411411
```https://hec1.splunk.com:8088,https://hec2.splunk.com:8088,https://hec3.splunk.com:8088```
412412
* `splunk.hec.token` - [Splunk Http Event Collector token] (http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Data/UsetheHTTPEventCollector#About_Event_Collector_tokens).
@@ -428,8 +428,8 @@ Use the below schema to configure Splunk Kafka Connector
428428
429429
### Acknowledgement Parameters
430430
#### Use Ack
431-
* `splunk.hec.ack.enabled` - Valid settings are `true` or `false`. When set to `true` the Splunk Kafka Connector will poll event ACKs for POST events before check-pointing the Kafka offsets. This is used to prevent data loss, as this setting implements guaranteed delivery. By default, this setting is set to `true`.
432-
> Note: If this setting is set to `true`, verify that the corresponding HEC token is also enabled with index acknowledgements, otherwise the data injection will fail, due to duplicate data. When set to `false`, the Splunk Kafka Connector will only POST events to your Splunk platform instance. After it receives a HTTP 200 OK response, it assumes the events are indexed by Splunk. Note: In cases where the Splunk platform crashes, there may be some data loss.
431+
* `splunk.hec.ack.enabled` - Valid settings are `true` or `false`. When set to `true` Splunk Connect for Kafka will poll event ACKs for POST events before check-pointing the Kafka offsets. This is used to prevent data loss, as this setting implements guaranteed delivery. By default, this setting is set to `true`.
432+
> Note: If this setting is set to `true`, verify that the corresponding HEC token is also enabled with index acknowledgements, otherwise the data injection will fail, due to duplicate data. When set to `false`, Splunk Connect for Kafka will only POST events to your Splunk platform instance. After it receives a HTTP 200 OK response, it assumes the events are indexed by Splunk. Note: In cases where the Splunk platform crashes, there may be some data loss.
433433
* `splunk.hec.ack.poll.interval` - This setting is only applicable when `splunk.hec.ack.enabled` is set to `true`. Internally it controls the event ACKs polling interval. By default, this setting is 10 seconds.
434434
* `splunk.hec.ack.poll.threads` - This setting is used for performance tuning and is only applicable when `splunk.hec.ack.enabled` is set to `true`. It controls how many threads should be spawned to poll event ACKs. By default, it is set to `1`.
435435
> Note: For large Splunk indexer clusters (For example, 100 indexers) you need to increase this number. Recommended increase to speed up ACK polling is 4 threads.
@@ -440,7 +440,7 @@ Use the below schema to configure Splunk Kafka Connector
440440
441441
##### /raw endpoint only
442442
* `splunk.hec.raw.line.breaker` - Only applicable to /raw HEC endpoint. The setting is used to specify a custom line breaker to help Splunk separate the events correctly.
443-
> Note: For example, you can specify "#####" as a special line breaker. Internally, the Splunk Kafka Connector will append this line breaker to every Kafka record to form a clear event boundary. The connector performs data injection in batch mode. On the Splunk platform side, you can configure **props.conf** to set up line breaker for the sourcetypes. Then the Splunk software will correctly break events for data flowing through /raw HEC endpoint. For questions on how and when to specify line breaker, go to the FAQ section. By default, this setting is empty.
443+
> Note: For example, you can specify "#####" as a special line breaker. Internally, Splunk Connect for Kafka will append this line breaker to every Kafka record to form a clear event boundary. The connector performs data injection in batch mode. On the Splunk platform side, you can configure **props.conf** to set up line breaker for the sourcetypes. Then the Splunk software will correctly break events for data flowing through /raw HEC endpoint. For questions on how and when to specify line breaker, go to the FAQ section. By default, this setting is empty.
444444
445445
##### /event endpoint only
446446
* `splunk.hec.json.event.enrichment` - Only applicable to /event HEC endpoint. This setting is used to enrich raw data with extra metadata fields. It contains a list of key value pairs separated by ",". The configured enrichment metadata will be indexed along with raw event data by Splunk software. Note: Data enrichment for /event HEC endpoint is only available in Splunk Enterprise 6.5 and above. By default, this setting is empty. See ([Documentation](http://dev.splunk.com/view/event-collector/SP-CAAAE8Y#indexedfield)) for more information.
@@ -584,7 +584,7 @@ A common architecture will include a load balancer in front of your Splunk platf
584584
585585
## Benchmark Results
586586
587-
A single Splunk Kafka Connector can reach maximum indexed throughput of **32 MB/second** with the following testbed and raw HEC endpoint in use:
587+
A single instance of Splunk Connect for Kafka can reach maximum indexed throughput of **32 MB/second** with the following testbed and raw HEC endpoint in use:
588588
589589
Hardware specifications:
590590
@@ -597,7 +597,7 @@ Hardware specifications:
597597
598598
## Scaling out your environment
599599
600-
Before scaling the Splunk Kafka Connector tier, ensure the bottleneck is in the connector tier and not in another component.
600+
Before scaling the Splunk Connect for Kafka tier, ensure the bottleneck is in the connector tier and not in another component.
601601
602602
Scaling out options:
603603
@@ -609,20 +609,20 @@ Scaling out options:
609609
610610
## Data loss and latency monitoring
611611
612-
When creating a Splunk Kafka Connector using the REST API, `"splunk.hec.track.data": "true"` can be configured to allow data loss tracking and data collection latency monitoring.
612+
When creating an instance of Splunk Connect for Kafka using the REST API, `"splunk.hec.track.data": "true"` can be configured to allow data loss tracking and data collection latency monitoring.
613613
This is accomplished by enriching the raw data with **offset, timestamp, partition, topic** metadata.
614614
615615
### Data Loss Tracking
616-
The Splunk Kafka Connector uses offset to track data loss since offsets in a Kafka topic partition are sequential. If a gap is observed in the Splunk software, there is data loss.
616+
Splunk Connect for Kafka uses offset to track data loss since offsets in a Kafka topic partition are sequential. If a gap is observed in the Splunk software, there is data loss.
617617
618618
### Data Latency Tracking
619-
The Splunk Kafka Connector uses the timestamp of the record to track the time elapsed between the time a Kafka record was created and the time the record was indexed in Splunk.
619+
Splunk Connect for Kafka uses the timestamp of the record to track the time elapsed between the time a Kafka record was created and the time the record was indexed in Splunk.
620620
621621
> Note: This setting will only work in conjunction with /event HEC endpoint (`"splunk.hec.raw" : "false"`)
622622
623623
### Malformed data
624624
625-
If the raw data of the Kafka records is a JSON object but is not able to be marshaled, or if the raw data is in bytes but it is not UTF-8 encodable, the Splunk Kafka Connector considers these records malformed. It will log the exception with Kafka specific information (topic, partition, offset) for these records within the console, as well as the malformed records information will be indexed in Splunk. Users can search "type=malformed" within Splunk to return any malformed Kafka records encountered.
625+
If the raw data of the Kafka records is a JSON object but is not able to be marshaled, or if the raw data is in bytes but it is not UTF-8 encodable, Splunk Connect for Kafka considers these records malformed. It will log the exception with Kafka specific information (topic, partition, offset) for these records within the console, as well as the malformed records information will be indexed in Splunk. Users can search "type=malformed" within Splunk to return any malformed Kafka records encountered.
626626
627627
## FAQ
628628
@@ -650,12 +650,12 @@ If the raw data of the Kafka records is a JSON object but is not able to be mars
650650
651651
4. How many tasks should I configure?
652652
653-
Do not create more tasks than the number of partitions. Generally speaking, creating 2 * CPU tasks per Splunk Kafka Connector is a safe estimate.
654-
> Note: For example, assume there are 5 Kafka Connects running the Splunk Kafka Connector. Each host is 8 CPUs with 16 GB memory. And there are 200 partitions to collect data from. `max.tasks` will be: `max.tasks` = 2 * CPUs/host * Kafka Connect instances = 2 * 8 * 5 = 80 tasks. Alternatively, if there are only 60 partitions to consume from, then just set max.tasks to 60. Otherwise, the remaining 20 will be pending, doing nothing.
653+
Do not create more tasks than the number of partitions. Generally speaking, creating 2 * CPU tasks per instance of Splunk Connect for Kafka is a safe estimate.
654+
> Note: For example, assume there are 5 Kafka Connects running Splunk Connect for Kafka. Each host is 8 CPUs with 16 GB memory. And there are 200 partitions to collect data from. `max.tasks` will be: `max.tasks` = 2 * CPUs/host * Kafka Connect instances = 2 * 8 * 5 = 80 tasks. Alternatively, if there are only 60 partitions to consume from, then just set max.tasks to 60. Otherwise, the remaining 20 will be pending, doing nothing.
655655
656656
5. How many Kafka Connect instances should I deploy?
657657
658-
This is highly dependent on how much volume per day the Splunk Kafka Connector needs to index in Splunk. In general an 8 CPU, 16 GB memory machine, can potentially achieve 50 - 60 MB/s throughput from Kafka into Splunk if Splunk is sized correctly.
658+
This is highly dependent on how much volume per day Splunk Connect for Kafka needs to index in Splunk. In general an 8 CPU, 16 GB memory machine, can potentially achieve 50 - 60 MB/s throughput from Kafka into Splunk if Splunk is sized correctly.
659659
660660
6. How can I track data loss and data collection latency?
661661
@@ -676,9 +676,9 @@ If the raw data of the Kafka records is a JSON object but is not able to be mars
676676
677677
## Troubleshooting
678678
679-
1. Append the **log4j.logger.com.splunk=DEBUG** to **config/connect-log4j.properties** file to enable more verbose logging for Splunk Kafka Connector.
679+
1. Append the **log4j.logger.com.splunk=DEBUG** to **config/connect-log4j.properties** file to enable more verbose logging for Splunk Connect for Kafka.
680680
2. Kafka connect encounters an "out of memory" error. Remember to export environment variable **KAFKA\_HEAP\_OPTS="-Xmx6G -Xms2G"**. Refer to the [Deployment](#deployment) section for more information.
681-
3. Can't see any Connector information on third party UI. For example, Splunk Kafka Connector is not shown on Confluent Control Center. Make sure cross origin access is enabled for Kafka Connect. Append the following two lines to connect configuration, e.g. `connect-distributed.properties` or `connect-distributed-quickstart.properties` and then restart Kafka Connect.
681+
3. Can't see any Connector information on third party UI. For example, Splunk Connect for Kafka is not shown on the Confluent Control Center. Make sure cross origin access is enabled for Kafka Connect. Append the following two lines to connect configuration, e.g. `connect-distributed.properties` or `connect-distributed-quickstart.properties` and then restart Kafka Connect.
682682
683683
```
684684
access.control.allow.origin=*

0 commit comments

Comments
 (0)