-
-
Couldn't load subscription status.
- Fork 585
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Many data scientists and machine learning engineers are more comfortable with Pandas over raw SQL. What do folks think about adding a new dolt pd command to the CLI options? In my proposal, it would work like this
$ dolt pd -h
NAME
dolt pd - Load SQL table into an interative Pandas dataframe
SYNOPSIS
dolt pd [-v <version>] <table> [<repo>]
OPTIONS
<table>
The existing table to read from.
<repo>
Path to local Dolt repository (default: current directory).
-v <version>
Specific version of the table to read using Dolt's AS_OF clause (default: latest).
$ dolt pd -v branch my_table /some/dolt/clone
# This chunk is automatically executed to save users the boilerplate
# Raise an error if the required Python packages are not installed (or distribute them with the dolt binary)
>>> import doltcli
>>> from doltpy.cli.read import read_pandas
>>> db = doltcli.Dolt('/some/dolt/clone')
>>> df = read_pandas(db, 'my_table', as_of='branch')```
# Then you're free to do various adhoc EDA/ETL tasks
>>> df.head()
>>> print(df.some_column.value_counts())
It'd be nice to default to iPython if it's available under the current environment
anklebreaker
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request