Skip to content

Feature/jdbc log s3 upload #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

kentoyoshida
Copy link
Contributor

Summary

This PR adds automatic JDBC log upload to S3 when communication errors occur, with support for both IAM role and explicit AWS credentials authentication.

Features Added

1. JdbcLogUploader Class

  • Automatic JDBC log upload to S3 when communication errors occur
  • Support for both IAM role and explicit AWS credentials authentication
  • Proper resource management with AutoCloseable
  • Cross-platform support using system temp directory

2. Configuration Options

  • upload_jdbc_log_to_s3: Enable/disable JDBC log upload (default: false)
  • s3_bucket: S3 bucket name (required when upload enabled)
  • s3_prefix: S3 key prefix (optional - uploads to bucket root if empty)
  • s3_region: AWS region (required when upload enabled)
  • s3_access_key_id: AWS access key ID (optional - uses IAM role if not specified)
  • s3_secret_access_key: AWS secret access key (optional - uses IAM role if not specified)

3. Timestamp Feature

  • Automatic timestamp addition to uploaded filenames
  • Format: yyyyMMdd_HHmmss
  • Example: snowflake_jdbc0.log.0snowflake_jdbc0.log_20250630_021500.0

4. Authentication Methods

  • IAM Role (Recommended): Uses DefaultCredentialsProvider when credentials are not specified
  • Explicit Credentials: Uses StaticCredentialsProvider when both access key and secret are provided

5. Error Handling

  • S3 upload failures are logged as warnings but don't interrupt main processing
  • Missing S3 configuration results in warning and skipped upload
  • Original SQL exceptions are preserved and re-thrown

Configuration Example

out:
  type: snowflake
  host: your-account.snowflakecomputing.com
  user: your_user
  password: your_password
  warehouse: your_warehouse
  database: your_database
  schema: your_schema
  table: your_table
  mode: insert
  
  # JDBC log upload configuration
  upload_jdbc_log_to_s3: true
  s3_bucket: your-log-bucket
  s3_prefix: snowflake-jdbc-logs  # Optional
  s3_region: us-east-1
  
  # Optional: Explicit AWS credentials (uses IAM role if not specified)
  s3_access_key_id: YOUR_ACCESS_KEY_ID
  s3_secret_access_key: YOUR_SECRET_ACCESS_KEY

Testing

  • Test mode available: Set `upload_jdbc_log_to_s3: true` to simulate communication errors
  • Comprehensive error handling and logging
  • Resource cleanup with try-with-resources

Documentation

  • Updated README with detailed configuration options
  • Added authentication method explanations
  • Included behavior descriptions and examples

- Add JdbcLogUploader class for automatic JDBC log upload to S3
- Support both IAM role and explicit AWS credentials authentication
- Add configuration options for S3 upload (bucket, prefix, region, credentials)
- Enable JDBC tracing when log upload is enabled
- Upload logs only when communication errors occur
- Implement proper resource management with AutoCloseable
- Add cross-platform support using system temp directory
- Update README with configuration examples and usage instructions
- Add AWS SDK v2 dependencies with conflict exclusions

This feature helps with debugging Snowflake connection issues by
automatically uploading JDBC driver logs to S3 when errors occur.
- Change s3_bucket, s3_prefix, s3_region, s3_access_key_id, s3_secret_access_key from String to Optional<String>
- Remove @ConfigDefault("null") annotations as they are not needed for Optional types
- Update usage in transaction method to handle Optional values properly
- Update README.md to clarify that S3 credentials are optional and use default AWS credentials provider chain
- Add @ConfigDefault("null") annotations to s3_bucket, s3_prefix, s3_region, s3_access_key_id, s3_secret_access_key
- This is required for Embulk's config system to properly handle Optional fields with null defaults
- Fixes 'Field is required but not set' error when S3 configuration is not provided
- Add automatic timestamp formatting (yyyyMMdd_HHmmss) to uploaded filenames
- Example: snowflake_jdbc.log -> snowflake_jdbc_20250630_021500.log
- Update README with timestamp feature documentation
- Clarify s3_prefix as optional parameter in documentation
- Improve S3 path handling for empty prefix cases
- Apply spotless formatting to Java files
- Fix line breaks and spacing issues
- Ensure code style compliance
Copy link
Contributor

@yas-okadatech yas-okadatech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

- **s3_prefix**: S3 key prefix for JDBC log upload (string, optional - if empty, files are uploaded to bucket root)
- **s3_region**: AWS region for S3 bucket (string, required when upload_jdbc_log_to_s3 is true)
- **s3_access_key_id**: AWS access key ID for S3 access (string, optional - uses default AWS credentials provider chain if not specified)
- **s3_secret_access_key**: AWS secret access key for S3 access (string, optional - uses default AWS credentials provider chain if not specified)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref: https://github.com/embulk/embulk-output-s3

(nits) s3_prefix could be s3_path_prefix like embulk-output-s3

String message = e.getMessage();
if (message != null
&& message.contains(ENCOUNTERED_COMMUNICATION_ERROR_MESSAGE)
&& t.getUploadJdbcLogToS3() == true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.getUploadJdbcLogToS3() == true can be t.getUploadJdbcLogToS3() (without == true)

@yas-okadatech yas-okadatech requested a review from Copilot June 30, 2025 05:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds automatic JDBC log upload to S3 when communication errors occur in the Snowflake output plugin, supporting both IAM role and explicit AWS credentials.

  • Introduces JdbcLogUploader for handling S3 uploads with timestamped filenames.
  • Extends SnowflakeOutputPlugin with new config options and error-handling logic to trigger uploads.
  • Updates dependencies to include AWS SDK modules and adds documentation in README.md.

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

File Description
src/main/java/org/embulk/output/s3/JdbcLogUploader.java New class to upload JDBC logs to S3 with timestamping.
src/main/java/org/embulk/output/SnowflakeOutputPlugin.java Added config properties, exception handling, and upload logic.
gradle/dependency-locks/embulkPluginRuntime.lockfile Locked new AWS SDK and related dependencies.
README.md Documented new JDBC log upload feature and configuration.
Comments suppressed due to low confidence (2)

src/main/java/org/embulk/output/SnowflakeOutputPlugin.java:5

  • Add imports for java.util.Arrays and java.util.Comparator to support the stream operations on logFiles, otherwise the code will not compile.
import java.io.File;

Comment on lines +259 to +262
if (e instanceof SQLException) {
String message = e.getMessage();
if (message != null
&& message.contains(ENCOUNTERED_COMMUNICATION_ERROR_MESSAGE)
Copy link
Preview

Copilot AI Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Relying on message text can be brittle; consider checking the SQL error code or vendor-specific code instead of string matching to detect communication errors more reliably.

Suggested change
if (e instanceof SQLException) {
String message = e.getMessage();
if (message != null
&& message.contains(ENCOUNTERED_COMMUNICATION_ERROR_MESSAGE)
if (e instanceof SnowflakeSQLException) {
SnowflakeSQLException snowflakeException = (SnowflakeSQLException) e;
if (snowflakeException.getErrorCode() == ENCOUNTERED_COMMUNICATION_ERROR_CODE

Copilot uses AI. Check for mistakes.

kentoyoshida and others added 3 commits June 30, 2025 12:10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants