datafold · williebsweet · Mar 27, 2023 · Mar 27, 2023 · Apr 3, 2023 · Apr 3, 2023
diff --git a/README.md b/README.md
@@ -5,11 +5,11 @@
 # **data-diff**
 
 ## What is `data-diff`?
-data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.
+data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. 
 
 ## Documentation
 
-[**🗎 Documentation website**](https://docs.datafold.com/os_diff/about) - our detailed documentation has everything you need to start diffing.
+[**🗎 Documentation website**](https://docs.datafold.com/guides/os_data_diff) - our detailed documentation has everything you need to start diffing.
 
 ### Databases we support
 
@@ -27,31 +27,11 @@ data-diff is a **free, open-source tool** that enables data professionals to det
 - DuckDB >=0.6
 - SQLite (coming soon)
 
-For their corresponding connection strings, check out our [detailed table](https://docs.datafold.com/os_diff/databases_we_support).
+For their corresponding connection strings, check out our [detailed table](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md).
 
 #### Looking for a database not on the list?
 If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list.
 
-## Use cases
-
-### Diff Tables Between Databases
-#### Quickly identify issues when moving data between databases
-
-<p align="center">
-  <img alt="diff2" src="https://user-images.githubusercontent.com/1799931/196754998-a88c0a52-8751-443d-b052-26c03d99d9e5.png" />
-</p>
-
-### Diff Tables Within a Database
-#### Improve code reviews by identifying data problems you don't have tests for
-<p align="center">
-  <a href=https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2 target="_blank">
-    <img alt="Intro to Diff" src="https://user-images.githubusercontent.com/1799931/196576582-d3535395-12ef-40fd-bbbb-e205ccae1159.png" width="50%" height="50%" />
-  </a>
-</p>
-
-&nbsp;
-&nbsp;
-
 ## Get started
 
 ### Installation
@@ -92,7 +72,7 @@ Once you've installed `data-diff`, you can run it from the command line.
 data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
 ```
 
-Be sure to read [the docs](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line) for detailed instructions how to build one of these commands depending on your database setup.
+Be sure to read [the docs](https://docs.datafold.com/reference/open_source/cli) for detailed instructions how to build one of these commands depending on your database setup.
 
 #### Code Example: Diff Tables Between Databases
 Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres.
@@ -130,22 +110,18 @@ In both code examples, I've used `<>` carrots to represent values that **should
 
 We know that in some cases, the data-diff command can become long and dense. And maybe you're new to the command line.
 
-* We're here to help [on slack](https://locallyoptimistic.slack.com/archives/C03HUNGQV0S) if you have ANY questions as you use `data-diff` in your workflow.
-* You can also post a question in [GitHub Discussions](https://github.com/datafold/data-diff/discussions).
-
-
-To get a Slack invite - [click here](https://locallyoptimistic.com/community/)
+* We're here to help! Post a question in [GitHub Discussions](https://github.com/datafold/data-diff/discussions).
 
 ## How to Use
 
-* [How to use from the shell (or: command-line)](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_command_line)
-* [How to use from Python](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_python)
-* [How to use with TOML configuration file](https://docs.datafold.com/os_diff/how_to_use/how_to_use_with_toml)
-* [Usage Analytics & Data Privacy](https://docs.datafold.com/os_diff/usage_analytics_data_privacy)
+* [Examples with dbt, joindiff, and hashdiff](https://docs.datafold.com/reference/open_source/cli#examples)
+* [Examples with Python](https://data-diff.readthedocs.io/en/latest/python-api.html)
+* [How to use with TOML configuration file](https://docs.datafold.com/reference/open_source/cli#toml-config-file)
 
 ## How to Contribute
 * Feel free to open an issue or contribute to the project by working on an existing issue.
 * Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started.
+* To add a new database driver, check out [docs](https://github.com/datafold/data-diff/blob/master/docs/new-database-driver-guide.rst).
 
 Big thanks to everyone who contributed so far:
 
@@ -155,7 +131,10 @@ Big thanks to everyone who contributed so far:
 
 ## Technical Explanation
 
-Check out this [technical explanation](https://docs.datafold.com/os_diff/technical_explanation) of how data-diff works.
+Check out this [technical explanation](https://github.com/datafold/data-diff/blob/master/docs/technical-explanation.md) of how data-diff works.
+
+## Analytics
+- [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md)
 
 ## License
 

diff --git a/docs/common_use_cases.md b/docs/common_use_cases.md
@@ -0,0 +1,14 @@
+# Common Use Cases
+
+## joindiff
+- **Inspect differences between branches**. Make sure your code results in only expected changes.
+- **Validate stability of critical downstream tables**. When refactoring a data pipeline, rest assured that the changes you make to upstream models have not impacted critical downstream models depended on by users and systems.
+- **Conduct better code reviews**. No matter how thoughtfully you review the code, run a diff to ensure that you don't accidentally approve an error.
+
+## hashdiff
+- **Verify data migrations**. Verify that all data was copied when doing a critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS.
+- **Verify data pipelines**. Moving data from a relational database to a warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline.
+- **Maintain data integrity SLOs**. You can create and monitor your SLO of e.g. 99.999% data integrity, and alert your team when data is missing.
+- **Debug complex data pipelines**. Data can get lost in pipelines that may span a half-dozen systems. data-diff helps you efficiently track down where a row got lost without needing to individually inspect intermediate datastores.
+- **Detect hard deletes for an `updated_at`-based pipeline**. If you're copying data to your warehouse based on an `updated_at`-style column, data-diff can find any hard-deletes that you may have missed.
+- **Make your replication self-healing**. You can use data-diff to self-heal by using the diff output to write/update rows in the target database.
diff --git a/docs/index.rst b/docs/index.rst
@@ -4,22 +4,12 @@
    :hidden:
 
    python-api
+   python_examples
 
 data-diff
 ---------
 
-**Data-diff** is a command-line tool and Python library to efficiently diff
-rows across two different databases.
-
-⇄  Verifies across many different databases (e.g. *PostgreSQL* -> *Snowflake*) !
-
-🔍 Outputs diff of rows in detail
-
-🚨 Simple CLI/API to create monitoring and alerts
-
-🔥 Verify 25M+ rows in <10s, and 1B+ rows in ~5min.
-
-♾️  Works for tables with 10s of billions of rows
+**Data-diff** is a command-line tool and Python library for comparing tables in and across databases.
 
 For more information, `See our README <https://github.com/datafold/data-diff#readme>`_
 
@@ -32,4 +22,4 @@ Resources
 - :doc:`python-api`
 - The rest of the `documentation`_
 
-.. _documentation: https://docs.datafold.com/os_diff/about/
+.. _documentation: https://docs.datafold.com/guides/os_data_diff
diff --git a/docs/python_examples.rst b/docs/python_examples.rst
@@ -0,0 +1,44 @@
+Python API Examples
+---------
+
+**Example 1: Diff tables in mysql and postgresql**
+
+.. code-block:: python
+    # Optional: Set logging to display the progress of the diff
+    import logging
+    logging.basicConfig(level=logging.INFO)
+
+    from data_diff import connect_to_table, diff_tables
+
+    table1 = connect_to_table("postgresql:///", "table_name", "id")
+    table2 = connect_to_table("mysql:///", "table_name", "id")
+
+    for different_row in diff_tables(table1, table2):
+        plus_or_minus, columns = different_row
+        print(plus_or_minus, columns)
+
+
+**Example 2: Connect to snowflake using dictionary configuration**
+
+.. code-block:: python
+    SNOWFLAKE_CONN_INFO = {
+        "driver": "snowflake",
+        "user": "erez",
+        "account": "whatever",
+        "database": "TESTS",
+        "warehouse": "COMPUTE_WH",
+        "role": "ACCOUNTADMIN",
+        "schema": "PUBLIC",
+        "key": "snowflake_rsa_key.p8",
+    }
+
+    snowflake_table = connect_to_table(SNOWFLAKE_CONN_INFO, "table_name")  # Uses id by default
+
+Run `help(connect_to_table)` and `help(diff_tables)` or read our API reference to learn more about the different options:
+
+- connect_to_table_
+
+- diff_tables_
+
+.. _connect_to_table: https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.connect_to_table
+.. _diff_tables: https://data-diff.readthedocs.io/en/latest/python-api.html#data_diff.diff_tables
diff --git a/docs/usage_analytics.md b/docs/usage_analytics.md
@@ -0,0 +1,22 @@
+# Usage Analytics & Data Privacy
+
+data-diff collects anonymous usage data to help our team improve the tool and to apply development efforts to where our users need them most.
+
+We capture two events: one when the data-diff run starts, and one when it is finished. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:
+
+- Operating System and Python version
+- Types of databases used (postgresql, mysql, etc.)
+- Sizes of tables diffed, run time, and diff row count (numbers only)
+- Error message, if any, truncated to the first 20 characters.
+- A persistent UUID to indentify the session, stored in `~/.datadiff.toml`
+
+To disable, use one of the following methods:
+
+* **CLI**: use the `--no-tracking` flag.
+* **Config file**: set `no_tracking = true` (for example, under `[run.default]`)
+* **Python API**:
+    ```python
+    import data_diff
+    # Invoke the following before making any API calls
+    data_diff.disable_tracking()
+    ```