This repository was archived by the owner on Sep 3, 2022. It is now read-only.
This repository was archived by the owner on Sep 3, 2022. It is now read-only.
Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41
Open
Description
We use Jupyter notebooks to access BigTable data like so:
from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)
for key, row in table.scan:
(we then convert this in Pandas DataFrames)
In regards to DataLab and DataProc integration - Jupyter Spark integration http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/ is a thing in Data Science - so how can we leverage DataLab notebooks over Spark jobs running on DataProc (eg stepwise pyspark job definitions, visualising job results)?
Also , how do we leverage IPython Parallel https://ipyparallel.readthedocs.io/en/latest/ and Jupyter Cluster notebook extensions in DataLab ?