Skip to content

[Docs][Core] Add Javadoc to MultiTableWriterRunnable#10583

Open
LiJie20190102 wants to merge 2 commits intoapache:devfrom
LiJie20190102:doc_multi_table_writer_runnable
Open

[Docs][Core] Add Javadoc to MultiTableWriterRunnable#10583
LiJie20190102 wants to merge 2 commits intoapache:devfrom
LiJie20190102:doc_multi_table_writer_runnable

Conversation

@LiJie20190102
Copy link
Contributor

Purpose of this pull request

Implement issue #10538

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@github-actions github-actions bot added the api label Mar 9, 2026
@DanielCarter-stack
Copy link

Issue 1: Incomplete JavaDoc (missing @return and @throws tags)

Location: seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableWriterRunnable.java:108-118

/**
 * Returns first unhandled exception from the run loop; null means the worker is healthy;
 * checked by MultiTableSinkWriter.subSinkErrorCheck()
 */
public Throwable getThrowable() {
    return throwable;
}

/**
 * Diagnostic field: the tableId of the row currently being written; may be null or stale
 * between rows
 */
public String getCurrentTableId() {
    return currentTableId;
}

Related context:

  • Caller: MultiTableSinkWriter.java:136 - writerRunnable.getThrowable()
  • Caller: MultiTableSinkWriter.java:139 - writerRunnable.getCurrentTableId()

Problem description:
JavaDoc lacks standard @return tags. Although the comment explains the return value meaning, it does not conform to Java standard documentation conventions. Additionally, the JavaDoc for the run() method does not explain possible exceptions (although the code catches exceptions, the documentation should explain this).

Potential risks:

  • Tool-generated API documentation may have incorrect formatting
  • IDE hover hints may be incomplete
  • Does not comply with the project's JavaDoc coding standards

Scope of impact:

  • Direct impact: API documentation generation for the MultiTableWriterRunnable class
  • Indirect impact: Developer experience dependent on JavaDoc tools
  • Impact scope: Single class (documentation generation)

Severity: MINOR

Improvement suggestions:

/**
 * Returns first unhandled exception from the run loop.
 * 
 * <p>A null return value means the worker is healthy. This method is regularly called by
 * {@link MultiTableSinkWriter#subSinkErrorCheck()} to detect failures in the async worker.
 *
 * @return the first unhandled exception, or {@code null} if the worker is healthy
 */
public Throwable getThrowable() {
    return throwable;
}

/**
 * Returns the table identifier of the row currently being processed.
 * 
 * <p>This is a diagnostic field that may be useful for debugging. The value may be
 * {@code null} when no row is being processed, or may be stale between rows.
 *
 * @return the current table identifier, or {@code null} if not available
 */
public String getCurrentTableId() {
    return currentTableId;
}

/**
 * Runs the async worker loop, continuously draining the queue and writing rows.
 * 
 * <p>This method runs an infinite loop that:
 * <ul>
 *   <li>Polls from the queue with a 100ms timeout</li>
 *   <li>Skips control rows with zero arity (schema evolution events)</li>
 *   <li>Dispatches data rows to the appropriate {@link SinkWriter}</li>
 *   <li>Stops on {@link InterruptedException} or any {@link Throwable}</li>
 * </ul>
 * 
 * <p>The method terminates when the thread is interrupted or an unhandled exception occurs.
 * Errors are captured in the {@code throwable} field for later inspection.
 */
@Override
public void run() {
    // ... existing code ...
}

Rationale:

  • Adding @return tags conforms to Java standard documentation conventions
  • Adding <p> paragraphs and <ul> lists improves readability (referencing the JavaDoc style of MultiTableSinkWriter)
  • More detailed @return descriptions help IDEs generate better hints
  • Adding more detailed descriptions to the run() method to explain loop termination conditions

Issue 2: Class-level JavaDoc lacks detailed explanation of concurrency semantics

Location: seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableWriterRunnable.java:29-32

/**
 * MultiTableWriterRunnable is the async worker that drains a {@link BlockingQueue} of {@link
 * SeaTunnelRow} and dispatches rows to per-table {@link SinkWriter} instances.
 */

Related context:

  • Caller: MultiTableSinkWriter.java:105 - creates runnable
  • Synchronization point: MultiTableSinkWriter.java:159,230,257,302,333 - 5 synchronized locations

Problem description:
Class-level JavaDoc is too concise, lacking explanation of key design decisions such as concurrency coordination mechanisms, lifecycle management, and error handling strategies. In contrast, the class-level JavaDoc for MultiTableSinkWriter (commit 0dc4118) has very detailed architectural explanations.

Potential risks:

  • New contributors may not understand why synchronized(this) is used
  • May not understand the coordination mechanism with MultiTableSinkWriter.snapshotState()
  • May not understand the source and purpose of control signals (arity == 0)

Scope of impact:

  • Direct impact: Code comprehensibility
  • Indirect impact: Contributor onboarding speed, code review efficiency
  • Impact scope: Single class (documentation quality)

Severity: MINOR

Improvement suggestions:

/**
 * An async worker thread that drains a {@link BlockingQueue} of {@link SeaTunnelRow} and 
 * dispatches rows to per-table {@link SinkWriter} instances.
 *
 * <h3>Concurrency Model</h3>
 *
 * <p>Each {@code MultiTableWriterRunnable} owns a single {@link BlockingQueue} and runs in a 
 * dedicated thread. The thread continuously polls the queue (with 100ms timeout) and writes 
 * rows to the appropriate sub-writer. Multiple instances are created by 
 * {@link MultiTableSinkWriter} to parallelize writes across different queues.
 *
 * <h3>Checkpoint Coordination</h3>
 *
 * <p>To ensure consistent snapshots, the {@code write()} operation is protected by 
 * {@code synchronized(this)}. This allows {@link MultiTableSinkWriter} to coordinate 
 * checkpoints by acquiring the same lock before calling 
 * {@link SinkWriter#snapshotState(long)}. The coordination points are:
 * <ul>
 *   <li>{@code snapshotState()}: drains queues, then locks each runnable</li>
 *   <li>{@code prepareCommit()}: drains queues, then locks each runnable</li>
 *   <li>{@code applySchemaChange()}: locks the specific runnable handling the table</li>
 *   <li>{@code close()}: drains queues, shuts down executor, then locks each runnable</li>
 * </ul>
 *
 * <h3>Control Signals</h3>
 *
 * <p>Rows with {@code getArity() == 0} are treated as control signals for schema evolution 
 * and coordination events. These are created by upstream components (e.g., 
 * {@code SchemaOperator}) and skipped during normal write processing.
 *
 * <h3>Error Handling</h3>
 *
 * <p>Exceptions are captured in the {@code throwable} field rather than propagated, 
 * allowing the thread to terminate gracefully. The parent {@link MultiTableSinkWriter} 
 * periodically checks this field via {@link #getThrowable()} to detect failures.
 *
 * @see MultiTableSinkWriter
 * @see SinkWriter
 */

Rationale:

  • Reference the detailed JavaDoc style of MultiTableSinkWriter
  • Add HTML tags (<h3>, <ul>, <li>) to organize structure
  • Explain concurrency coordination mechanisms in detail (this is the core design of this class)
  • Explain the source of control signals (referencing SchemaOperator)
  • Add @see tags to link related classes

Issue 3: Change to .asf.yaml is unrelated to PR theme

Location: .asf.yaml:37-38

-    - chl-wxp
     - LiJie20190102
+    - chl-wxp

Problem description:
The PR title is "[Docs][Core] Add Javadoc to MultiTableWriterRunnable", but the change to .asf.yaml (collaborator list sorting adjustment) is unrelated to documentation improvements. This violates the best practice of "one PR does one thing".

Potential risks:

  • Confuses the scope of PR review
  • If the PR is reverted, the collaborator changes will also be reverted
  • Violates Apache project change management best practices

Scope of impact:

  • Direct impact: .asf.yaml file
  • Indirect impact: PR review process, code review attention
  • Impact scope: Project configuration

Severity: MINOR

Improvement suggestions:
The change to .asf.yaml should be moved to a separate PR, or this change should be removed from the current PR.

Rationale:

  • Follow the "single responsibility principle"
  • Keep PR review focused on documentation improvements
  • Simplify rollback history


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants