Limiting JDBC Connections in Dataflow DoFn

This example demonstrates how to avoid creating too many database connections in an Google Cloud Dataflow job by using a singleton JDBC connection pool per worker.

When scaling Dataflow, each worker may create multiple DoFn instances. If each DoFn opens its own JDBC connection, the database can quickly become overloaded. This project shows how to share a limited pool of JDBC connections across all DoFn instances on the same worker.

🔑 Key Idea

Use a Java singleton to ensure that only one connection pool exists per worker VM.
Each DoFn borrows connections from this shared pool.
Total number of connections =
maxNumWorkers × maximumPoolSize

This guarantees predictable database load, even when scaling your Dataflow job.

🛠️ How It Works

Singleton Connection Pool
Implemented in JdbcClient.java.
A static singleton manages JDBC connections, initialized once per worker.
DoFn Usage
Demonstrated in JdbcDoFn.java.
The DoFn retrieves connections from the pool instead of opening new ones.
Connection Limits
The maximum number of concurrent DB connections is controlled by:
- maxNumWorkers (Dataflow setting)
- maximumPoolSize (connection pool config)

⚠️ When to Use This Pattern

✅ Use it when:

Your pipeline performs lookups or element-wise operations on a database (e.g., enriching records with reference data).
You need to scale safely without overwhelming the DB.
You cannot use bulk export/import tools for your use case.

❌ Avoid it when:

You need to read/write entire tables — in that case, prefer JdbcIO, which is designed for bulk operations.

👉 This pattern does not replace JdbcIO.
It complements it, and is specifically meant for per-element database access in a DoFn where direct JDBC calls are unavoidable.

ℹ️ Limitations

Depending on how Dataflow distributes work across workers, the maximum number of connections might not be fully utilized.
This is because not every worker might be involved in executing the part of the pipeline that requires a JDBC connection.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Limiting JDBC Connections in Dataflow DoFn

🔑 Key Idea

🛠️ How It Works

⚠️ When to Use This Pattern

ℹ️ Limitations

About

Uh oh!

Languages

ncwhh/dataflow-jdbc-connection-pool

Folders and files

Latest commit

History

Repository files navigation

Limiting JDBC Connections in Dataflow DoFn

🔑 Key Idea

🛠️ How It Works

⚠️ When to Use This Pattern

ℹ️ Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages