Skip to content

Pandas "shell" as alternative to dolt sql #3364

@addisonklinke

Description

@addisonklinke

Many data scientists and machine learning engineers are more comfortable with Pandas over raw SQL. What do folks think about adding a new dolt pd command to the CLI options? In my proposal, it would work like this

$ dolt pd -h
NAME
	dolt pd - Load SQL table into an interative Pandas dataframe

SYNOPSIS
	dolt pd [-v <version>] <table> [<repo>]

OPTIONS
	<table>
	  The existing table to read from.
	<repo>
	  Path to local Dolt repository (default: current directory).
	-v <version>
	  Specific version of the table to read using Dolt's AS_OF clause (default: latest).

$ dolt pd -v branch my_table /some/dolt/clone

# This chunk is automatically executed to save users the boilerplate
# Raise an error if the required Python packages are not installed (or distribute them with the dolt binary)
>>> import doltcli
>>> from doltpy.cli.read import read_pandas
>>> db = doltcli.Dolt('/some/dolt/clone')
>>> df = read_pandas(db, 'my_table', as_of='branch')```

# Then you're free to do various adhoc EDA/ETL tasks
>>> df.head()
>>> print(df.some_column.value_counts())

It'd be nice to default to iPython if it's available under the current environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions