Skip to content
This repository was archived by the owner on Sep 3, 2022. It is now read-only.
This repository was archived by the owner on Sep 3, 2022. It is now read-only.

Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

Open
@joshreuben456

Description

@joshreuben456

We use Jupyter notebooks to access BigTable data like so:

from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)

for key, row in table.scan:

(we then convert this in Pandas DataFrames)

In regards to DataLab and DataProc integration - Jupyter Spark integration http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/ is a thing in Data Science - so how can we leverage DataLab notebooks over Spark jobs running on DataProc (eg stepwise pyspark job definitions, visualising job results)?

Also , how do we leverage IPython Parallel https://ipyparallel.readthedocs.io/en/latest/ and Jupyter Cluster notebook extensions in DataLab ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions