-
Notifications
You must be signed in to change notification settings - Fork 288
readme: recommend just 1m rows for development #20
Conversation
README.md
Outdated
|
||
```shell-session | ||
poetry run python3 -m unittest | ||
``` | ||
wget https://files.grouplens.org/datasets/movielens/ml-25m.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to 1m rows ;)
https://files.grouplens.org/datasets/movielens/ml-1m.zip
There's also 100k if you want it even faster. (see https://files.grouplens.org/datasets/movielens )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahahahah alright that's smarter
README.md
Outdated
preql -f dev/prepare_db.pql mysql://mysql:Password1@127.0.0.1:3306/mysql | ||
|
||
# These typically would run remotely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Categorize them by classical / cloud dbs . then it's obvious
README.md
Outdated
|
||
```shell-session | ||
preql -f dev/prepare_db.pql postgres://postgres:Password1@127.0.0.1:5432/postgres | ||
# You can compare in just one if you like, you don't need to seed them all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment here is out of place. But we can write at the top "These are optional, based on what dbs you have. Just one of them is enough to run everything."
Also they need mysql to run tests. Unless we switch the default to postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, but you don't need to populate MySQL to run tests (and we should keep it that way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point
README.md
Outdated
preql -f dev/prepare_db.psq snowflake://<uri> | ||
preql -f dev/prepare_db.psq mssql://<uri> | ||
preql -f dev/prepare_db_bigquery.pql bigquery:///<project> # Bigquery has its own | ||
``` | ||
|
||
**6. Run data-diff against seeded database** | ||
**5. Run `data-diff` against seeded database** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think it's better to leave it as text, rather than code, as it's a name. Maybe put it in bold instead?
README.md
Outdated
@@ -180,7 +188,10 @@ Diff-Split: +250156 -0 | |||
``` | |||
|
|||
# How to publish to PyPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have instructions on publishing to PyPI? We are the only ones who will do it. (and probably 1 specific person who's in charge of it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had asked Chiel for it since I've never done it before! If you think this is completely out of place to put this in a README, even if it's just for us, we can remove it. That's my bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove it. It will confuse users, if nothing else.
It's fine if we create PUBLISH.md if we just want to be perfectly open, I just don't think it belongs in the README.
@erezsh tried addressing your comments, good to go? ✅ |
PR #20 with added infrastructure (TimestampTZ repr)
It seems a bit more reasonable to just import 1M rows for development to not make it take 10min or so to get started, even longer on other databases. 1M is plenty for getting started