Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Commit 19fc965

Browse files
authored
Merge pull request #493 from leoebfolsom/fix-readme-link
fix up links in readme
2 parents 4cf56db + eecf119 commit 19fc965

File tree

3 files changed

+49
-15
lines changed

3 files changed

+49
-15
lines changed

README.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55
# **data-diff**
66

77
## What is `data-diff`?
8-
data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.
8+
data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables.
99

1010
## Documentation
1111

12-
[**🗎 Documentation website**](https://docs.datafold.com/os_diff/about) - our detailed documentation has everything you need to start diffing.
12+
[**🗎 Documentation**](https://docs.datafold.com/guides/os_data_diff) - our detailed documentation has everything you need to start diffing.
1313

1414
### Databases we support
1515

@@ -27,7 +27,7 @@ data-diff is a **free, open-source tool** that enables data professionals to det
2727
- DuckDB >=0.6
2828
- SQLite (coming soon)
2929

30-
For their corresponding connection strings, check out our [detailed table](https://docs.datafold.com/os_diff/databases_we_support).
30+
For their corresponding connection strings, check out our [detailed table](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md).
3131

3232
#### Looking for a database not on the list?
3333
If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list.
@@ -92,7 +92,7 @@ Once you've installed `data-diff`, you can run it from the command line.
9292
data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
9393
```
9494

95-
Be sure to read [the docs](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line) for detailed instructions how to build one of these commands depending on your database setup.
95+
Be sure to read [the docs](https://docs.datafold.com/reference/open_source/cli) for detailed instructions how to build one of these commands depending on your database setup.
9696

9797
#### Code Example: Diff Tables Between Databases
9898
Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres.
@@ -110,8 +110,6 @@ data-diff \
110110

111111
#### Code Example: Diff Tables Within a Database
112112

113-
Here's a code example from [the video](https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2), where we compare data between two Snowflake tables within one database.
114-
115113
```
116114
data-diff \
117115
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \
@@ -130,22 +128,19 @@ In both code examples, I've used `<>` carrots to represent values that **should
130128

131129
We know that in some cases, the data-diff command can become long and dense. And maybe you're new to the command line.
132130

133-
* We're here to help [on slack](https://locallyoptimistic.slack.com/archives/C03HUNGQV0S) if you have ANY questions as you use `data-diff` in your workflow.
131+
* We're here to help [on slack](https://getdbt.slack.com/archives/C03D25A92UU) if you have ANY questions as you use `data-diff` in your workflow.
134132
* You can also post a question in [GitHub Discussions](https://github.com/datafold/data-diff/discussions).
135133

136-
137-
To get a Slack invite - [click here](https://locallyoptimistic.com/community/)
138-
139134
## How to Use
140135

141-
* [How to use from the shell (or: command-line)](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line)
142-
* [How to use from Python](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_python)
143-
* [How to use with TOML configuration file](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_toml)
144-
* [Usage Analytics & Data Privacy](https://docs.datafold.com/os_diff/usage_analytics_data_privacy)
136+
* [Examples with dbt, joindiff, and hashdiff](https://docs.datafold.com/reference/open_source/cli#examples)
137+
* [Examples with Python](https://data-diff.readthedocs.io/en/latest/python-api.html)
138+
* [How to use with TOML configuration file](https://docs.datafold.com/reference/open_source/cli#toml-config-file)
145139

146140
## How to Contribute
147141
* Feel free to open an issue or contribute to the project by working on an existing issue.
148142
* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started.
143+
* To add a new database driver, check out [docs](https://github.com/datafold/data-diff/blob/master/docs/new-database-driver-guide.rst).
149144

150145
Big thanks to everyone who contributed so far:
151146

@@ -155,7 +150,10 @@ Big thanks to everyone who contributed so far:
155150

156151
## Technical Explanation
157152

158-
Check out this [technical explanation](https://docs.datafold.com/os_diff/technical_explanation) of how data-diff works.
153+
Check out this [technical explanation](https://github.com/datafold/data-diff/blob/master/docs/technical-explanation.md) of how data-diff works.
154+
155+
## Analytics
156+
* [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md)
159157

160158
## License
161159

docs/common_use_cases.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Common Use Cases
2+
3+
## joindiff
4+
- **Inspect differences between branches**. Make sure your code results in only expected changes.
5+
- **Validate stability of critical downstream tables**. When refactoring a data pipeline, rest assured that the changes you make to upstream models have not impacted critical downstream models depended on by users and systems.
6+
- **Conduct better code reviews**. No matter how thoughtfully you review the code, run a diff to ensure that you don't accidentally approve an error.
7+
8+
## hashdiff
9+
- **Verify data migrations**. Verify that all data was copied when doing a critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS.
10+
- **Verify data pipelines**. Moving data from a relational database to a warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline.
11+
- **Maintain data integrity SLOs**. You can create and monitor your SLO of e.g. 99.999% data integrity, and alert your team when data is missing.
12+
- **Debug complex data pipelines**. Data can get lost in pipelines that may span a half-dozen systems. data-diff helps you efficiently track down where a row got lost without needing to individually inspect intermediate datastores.
13+
- **Detect hard deletes for an `updated_at`-based pipeline**. If you're copying data to your warehouse based on an `updated_at`-style column, data-diff can find any hard-deletes that you may have missed.
14+
- **Make your replication self-healing**. You can use data-diff to self-heal by using the diff output to write/update rows in the target database.

docs/usage_analytics.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Usage Analytics & Data Privacy
2+
3+
data-diff collects anonymous usage data to help our team improve the tool and to apply development efforts to where our users need them most.
4+
5+
We capture two events: one when the data-diff run starts, and one when it is finished. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:
6+
7+
- Operating System and Python version
8+
- Types of databases used (postgresql, mysql, etc.)
9+
- Sizes of tables diffed, run time, and diff row count (numbers only)
10+
- Error message, if any, truncated to the first 20 characters.
11+
- A persistent UUID to indentify the session, stored in `~/.datadiff.toml`
12+
13+
To disable, use one of the following methods:
14+
15+
* **CLI**: use the `--no-tracking` flag.
16+
* **Config file**: set `no_tracking = true` (for example, under `[run.default]`)
17+
* **Python API**:
18+
```python
19+
import data_diff
20+
# Invoke the following before making any API calls
21+
data_diff.disable_tracking()
22+
```

0 commit comments

Comments
 (0)