This repository was archived by the owner on May 17, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 288
Update docs and readme to align with Docs v3 #471
Closed
Closed
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
f7c607e
Create usage_analytics.md
williebsweet 44d9abb
Create common_use_cases.md
williebsweet 7e3250e
reorder and rename sections
williebsweet df85997
remove hashdiff overview
williebsweet 6b9e56e
update with docs v3 links
williebsweet 9bc7d6e
add python examples
williebsweet 1f48a66
add python examples
williebsweet d464d38
update what is data diff
williebsweet df9263a
update doc links
williebsweet ae75b71
remove use cases section
williebsweet d3b1276
remove locally optimistic slack
williebsweet d42059a
fix spelling / grammar errors
williebsweet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Common Use Cases | ||
|
||
## joindiff | ||
- **Inspect differences between branches**. Make sure your code results in only expected changes. | ||
- **Validate stability of critical downstream tables**. When refactoring a data pipeline, rest assured that the changes you make to upstream models have not impacted critical downstream models depended on by users and systems. | ||
- **Conduct better code reviews**. No matter how thoughtfully you review the code, run a diff to ensure that you don't accidentally approve an error. | ||
|
||
## hashdiff | ||
- **Verify data migrations**. Verify that all data was copied when doing a critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS. | ||
- **Verify data pipelines**. Moving data from a relational database to a warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline. | ||
- **Maintain data integrity SLOs**. You can create and monitor your SLO of e.g. 99.999% data integrity, and alert your team when data is missing. | ||
- **Debug complex data pipelines**. Data can get lost in pipelines that may span a half-dozen systems. data-diff helps you efficiently track down where a row got lost without needing to individually inspect intermediate datastores. | ||
- **Detect hard deletes for an `updated_at`-based pipeline**. If you're copying data to your warehouse based on an `updated_at`-style column, data-diff can find any hard-deletes that you may have missed. | ||
- **Make your replication self-healing**. You can use data-diff to self-heal by using the diff output to write/update rows in the target database. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
Python API Examples | ||
--------- | ||
|
||
**Example 1: Diff tables in mysql and postgresql** | ||
|
||
.. code-block:: python | ||
# Optional: Set logging to display the progress of the diff | ||
import logging | ||
logging.basicConfig(level=logging.INFO) | ||
|
||
from data_diff import connect_to_table, diff_tables | ||
|
||
table1 = connect_to_table("postgresql:///", "table_name", "id") | ||
table2 = connect_to_table("mysql:///", "table_name", "id") | ||
|
||
for different_row in diff_tables(table1, table2): | ||
plus_or_minus, columns = different_row | ||
print(plus_or_minus, columns) | ||
|
||
|
||
**Example 2: Connect to snowflake using dictionary configuration** | ||
|
||
.. code-block:: python | ||
SNOWFLAKE_CONN_INFO = { | ||
"driver": "snowflake", | ||
"user": "erez", | ||
"account": "whatever", | ||
"database": "TESTS", | ||
"warehouse": "COMPUTE_WH", | ||
"role": "ACCOUNTADMIN", | ||
"schema": "PUBLIC", | ||
"key": "snowflake_rsa_key.p8", | ||
} | ||
|
||
snowflake_table = connect_to_table(SNOWFLAKE_CONN_INFO, "table_name") # Uses id by default | ||
|
||
Run `help(connect_to_table)` and `help(diff_tables)` or read our API reference to learn more about the different options: | ||
|
||
- connect_to_table_ | ||
|
||
- diff_tables_ | ||
|
||
.. _connect_to_table: https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.connect_to_table | ||
.. _diff_tables: https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.diff_tables |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Usage Analytics & Data Privacy | ||
|
||
data-diff collects anonymous usage data to help our team improve the tool and to apply development efforts to where our users need them most. | ||
|
||
We capture two events: one when the data-diff run starts, and one when it is finished. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to: | ||
|
||
- Operating System and Python version | ||
- Types of databases used (postgresql, mysql, etc.) | ||
- Sizes of tables diffed, run time, and diff row count (numbers only) | ||
- Error message, if any, truncated to the first 20 characters. | ||
- A persistent UUID to indentify the session, stored in `~/.datadiff.toml` | ||
|
||
To disable, use one of the following methods: | ||
|
||
* **CLI**: use the `--no-tracking` flag. | ||
* **Config file**: set `no_tracking = true` (for example, under `[run.default]`) | ||
* **Python API**: | ||
```python | ||
import data_diff | ||
# Invoke the following before making any API calls | ||
data_diff.disable_tracking() | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.