Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

readme: recommend just 1m rows for development #20

Merged
merged 1 commit into from
May 12, 2022
Merged

Conversation

sirupsen
Copy link
Contributor

It seems a bit more reasonable to just import 1M rows for development to not make it take 10min or so to get started, even longer on other databases. 1M is plenty for getting started

README.md Outdated

```shell-session
poetry run python3 -m unittest
```
wget https://files.grouplens.org/datasets/movielens/ml-25m.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to 1m rows ;)

https://files.grouplens.org/datasets/movielens/ml-1m.zip

There's also 100k if you want it even faster. (see https://files.grouplens.org/datasets/movielens )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahahahah alright that's smarter

README.md Outdated
preql -f dev/prepare_db.pql mysql://mysql:Password1@127.0.0.1:3306/mysql

# These typically would run remotely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Categorize them by classical / cloud dbs . then it's obvious

README.md Outdated

```shell-session
preql -f dev/prepare_db.pql postgres://postgres:Password1@127.0.0.1:5432/postgres
# You can compare in just one if you like, you don't need to seed them all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment here is out of place. But we can write at the top "These are optional, based on what dbs you have. Just one of them is enough to run everything."

Also they need mysql to run tests. Unless we switch the default to postgres.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, but you don't need to populate MySQL to run tests (and we should keep it that way)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point

README.md Outdated
preql -f dev/prepare_db.psq snowflake://<uri>
preql -f dev/prepare_db.psq mssql://<uri>
preql -f dev/prepare_db_bigquery.pql bigquery:///<project> # Bigquery has its own
```

**6. Run data-diff against seeded database**
**5. Run `data-diff` against seeded database**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's better to leave it as text, rather than code, as it's a name. Maybe put it in bold instead?

README.md Outdated
@@ -180,7 +188,10 @@ Diff-Split: +250156 -0
```

# How to publish to PyPI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have instructions on publishing to PyPI? We are the only ones who will do it. (and probably 1 specific person who's in charge of it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had asked Chiel for it since I've never done it before! If you think this is completely out of place to put this in a README, even if it's just for us, we can remove it. That's my bad

Copy link
Contributor

@erezsh erezsh May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove it. It will confuse users, if nothing else.

It's fine if we create PUBLISH.md if we just want to be perfectly open, I just don't think it belongs in the README.

@sirupsen
Copy link
Contributor Author

@erezsh tried addressing your comments, good to go? ✅

@sirupsen sirupsen merged commit a40c59c into master May 12, 2022
@sirupsen sirupsen deleted the dev-1m-rows branch May 12, 2022 13:04
nolar pushed a commit that referenced this pull request Apr 6, 2023
nolar pushed a commit that referenced this pull request Apr 6, 2023
PR #20 with added infrastructure (TimestampTZ repr)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants