You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 17, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: docs/new-database-driver-guide.rst
+39-16Lines changed: 39 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,11 +24,24 @@ Then, users can install the dependencies needed for your database driver, with `
24
24
25
25
This way, data-diff can support a wide variety of drivers, without requiring our users to install libraries that they won't use.
26
26
27
-
2. Implement database module
27
+
2. Implement a database module
28
28
----------------------------
29
29
30
30
New database modules belong in the ``data_diff/databases`` directory.
31
31
32
+
The module consists of:
33
+
1. Dialect (Class responsible for normalizing/casting fields. e.g. Numbers/Timestamps)
34
+
2. Database class that handles connecting to the DB, querying (if the default doesn't work) , closing connectiosn and etc.
35
+
36
+
Choosing a base class, based on threading Model
37
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38
+
39
+
You can choose to inherit from either ``base.Database`` or ``base.ThreadedDatabase``.
40
+
41
+
Usually, databases with cursor-based connections, like MySQL or Postgresql, only allow one thread per connection. In order to support multithreading, we implement them by inheriting from ``ThreadedDatabase``, which holds a pool of worker threads, and creates a new connection per thread.
42
+
43
+
Usually, cloud databases, such as snowflake and bigquery, open a new connection per request, and support simultaneous queries from any number of threads. In other words, they already support multithreading, so we can implement them by inheriting directly from ``Database``.
44
+
32
45
Import on demand
33
46
~~~~~~~~~~~~~~~~~
34
47
@@ -50,16 +63,6 @@ Instead, they should be imported and initialized within a function. Example:
50
63
51
64
We use the ``import_helper()`` decorator to provide a uniform and informative error. The string argument should be the name of the package, as written in ``pyproject.toml``.
52
65
53
-
Choosing a base class, based on threading Model
54
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55
-
56
-
You can choose to inherit from either ``base.Database`` or ``base.ThreadedDatabase``.
57
-
58
-
Usually, databases with cursor-based connections, like MySQL or Postgresql, only allow one thread per connection. In order to support multithreading, we implement them by inheriting from ``ThreadedDatabase``, which holds a pool of worker threads, and creates a new connection per thread.
59
-
60
-
Usually, cloud databases, such as snowflake and bigquery, open a new connection per request, and support simultaneous queries from any number of threads. In other words, they already support multithreading, so we can implement them by inheriting directly from ``Database``.
0 commit comments