[Feature][Connector-V2][Syslog] Add Syslog source connector with RFC 3164 over TCP support#10900
[Feature][Connector-V2][Syslog] Add Syslog source connector with RFC 3164 over TCP support#10900MartinLam12 wants to merge 1 commit into
Conversation
…3164 over TCP support Implements a new source connector that listens on a TCP port and receives syslog messages formatted according to RFC 3164 (BSD syslog). Each message is parsed into structured columns: facility, severity, timestamp, hostname, app_name, proc_id, and message. - connector-syslog module with SyslogSource, SyslogSourceReader, SyslogSourceFactory - RFC 3164 parser with facility/severity extraction via PRI field - Registers connector in plugin-mapping.properties and seatunnel-dist - Unit tests covering standard messages, PID field, single-digit day, and error cases - English and Chinese documentation Closes apache#2649 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Could you please enable CI following the instructions? |
dybyte
left a comment
There was a problem hiding this comment.
Could you please add a minimal e2e test for this connector?
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I reviewed the full current head from the runtime path instead of only the parser code. The RFC 3164 parsing itself is a nice start, but I found two blocking issues before this connector is ready to merge.
What problem this PR solves
- User pain point
Users may want SeaTunnel to receive RFC 3164 syslog messages directly over TCP instead of tailing files or relying on another collector. - Fix approach
This PR adds a passive TCP syslog source, parser tests, and connector docs. - One sentence
The parser direction is fine, but the current source still advertises a bounded/batch contract that the runtime cannot honor, and it does not yet have executable lifecycle coverage for a real network source.
Simple example: if a batch job starts this connector and waits for the input to finish, there is no predefined end-of-input. The source is just a listening socket. Today the code marks it as BOUNDED in batch mode and stops after the first client disconnects, which is not a real bounded-source contract.
1. Code change review
1.1 Core logic analysis
Runtime chain I checked:
job starts
-> SyslogSource.getBoundedness()
-> returns BOUNDED when job.mode = BATCH
-> SyslogSourceReader.open()
-> create listening ServerSocket
-> SyslogSourceReader.pollNext()
-> accept client connection
-> processConnection(...)
-> if bounded -> signalNoMoreElement() and return after the first connection
Key findings:
- This is a passive listener source, not a finite snapshot source.
- In batch mode there is no real completion boundary to know when all syslog input has arrived.
- The current implementation ends the source after the first accepted connection finishes.
- The docs currently advertise both batch and stream support, so users are told a runtime contract that the code cannot actually guarantee.
1.2 Compatibility impact
This introduces a user-visible contract mismatch rather than an API break. The current batch claim would mislead users into relying on behavior that is not actually implemented.
1.3 Performance / side effects
The main risk is correctness and job lifecycle semantics, not raw CPU or memory. In batch mode the job can either wait forever for the first client or terminate after just one connection while more senders still exist.
1.4 Error handling and logging
Issue 1: the connector reports BOUNDED in batch mode even though the runtime is a passive TCP listener with no real end-of-input contract
- Location:
seatunnel-connectors-v2/connector-syslog/src/main/java/org/apache/seatunnel/connectors/seatunnel/syslog/source/SyslogSource.java:78-82;seatunnel-connectors-v2/connector-syslog/src/main/java/org/apache/seatunnel/connectors/seatunnel/syslog/source/SyslogSourceReader.java:95-103;docs/en/connectors/source/Syslog.md:11-18 - Problem description:
SyslogSource.getBoundedness()switches toBOUNDEDwhenjob.mode = BATCH. ButSyslogSourceReaderis implemented as a long-lived listening socket. There is no finite split, no byte/record boundary, and no protocol marker that tells the source when all syslog producers are done. The current bounded path just accepts one client connection, processes it, then callssignalNoMoreElement()and returns. - Potential risk:
This is a blocking correctness issue on the normal runtime path. A batch job can hang waiting for the first connection forever, or stop after the first sender disconnects while later senders still have data. - Best improvement suggestion:
Option A: make this connector streaming-only for now and remove the batch claim from both runtime metadata and docs.
Option B: if batch support is really required, define an explicit bounded ingestion contract first (for example, a finite source input artifact rather than a passive socket listener). - Severity: High
Issue 2: the current test coverage only validates string parsing; it does not execute the real socket lifecycle of a network source
- Location:
seatunnel-connectors-v2/connector-syslog/src/test/java/org/apache/seatunnel/connectors/seatunnel/syslog/source/SyslogSourceReaderTest.java:25-102 - Problem description:
The new tests are all parser-level unit tests. They never open aServerSocket, never verify accept/read/close behavior, and never cover the boundedness/lifecycle branch that is central to this PR. - Potential risk:
The most important behavior for a network source — connection handling and end-of-input semantics — is still unvalidated. - Best improvement suggestion:
Add at least one minimal integration-style test that starts the reader, pushes a few syslog lines through a real socket, and verifies the lifecycle behavior you want to support. - Severity: High
2. Code quality evaluation
2.1 Code style
The parser implementation is straightforward and readable. The blockers are semantic/runtime issues rather than style issues.
2.2 Test coverage and stability
The current parser assertions are deterministic and do not introduce a flaky-test pattern. Stability rating: Stable.
The coverage gap is that the real network lifecycle is not tested at all yet.
2.3 Documentation
docs/en/connectors/source/Syslog.md currently marks both batch and stream as supported. If this connector stays listener-based, docs/zh needs the same correction as well.
3. Architecture
3.1 Solution elegance
Adding a passive socket source is a useful direction, but the connector contract should match what a listener can realistically guarantee.
3.2 Maintainability
The runtime is simple, which is good. Making it "batch" by stopping after the first connection adds surprising lifecycle behavior that will be hard to reason about later.
3.3 Extensibility
Once the connector is honest about being streaming-only, it will be much easier to extend safely.
3.4 Historical compatibility
No historical-upgrade blocker here; the issue is the brand-new contract being inaccurate on first introduction.
4. Issue summary
| No. | Issue | Location | Severity |
|---|---|---|---|
| Issue 1 | batch / bounded support is advertised but the listener runtime has no real end-of-input contract | SyslogSource.java:78-82; SyslogSourceReader.java:95-103; Syslog.md:11-18 |
High |
| Issue 2 | tests only cover parsing, not the real socket lifecycle | SyslogSourceReaderTest.java:25-102 |
High |
5. Merge conclusion
Conclusion: can merge after fixes
- Blocking items
- Issue 1: make the connector contract honest by removing the current fake bounded/batch behavior or redesigning batch semantics explicitly.
- Issue 2: add executable validation for the real socket lifecycle, not only parser assertions.
- Suggested but non-blocking follow-ups
- The current GitHub
Buildisaction_requiredbecause the workflow run was not detected on the fork. Please enable Actions / retrigger CI as suggested by the check-run message.
Overall, the parser foundation is promising, but from the runtime perspective this is not merge-ready yet. A passive syslog listener should not claim a bounded batch contract unless the source can actually define and verify the end of input.
Purpose
Closes #2649
Adds a new
Syslogsource connector that receives syslog messages over TCP and parses them according to RFC 3164 (BSD syslog format).What's Included
seatunnel-connectors-v2/plugin-mapping.propertiesandseatunnel-distOutput Schema
Each syslog message is parsed into 7 columns:
Scope (Phase 1)
UDP and RFC 5424 support are deferred to a future phase.
Example Config
Checklist
./mvnw spotless:apply)./mvnw -q -DskipTests verify)plugin-mapping.propertiesupdatedseatunnel-dist/pom.xmlupdateddocs/enanddocs/zh🤖 Generated with Claude Code