Persist pandas DataFrame objects to HBase and read them back later.
Pre-requisites
- Hbase Thrift server running in 127.0.0.1:9090
- Hbase table sample_table created with column family cf
Known Issues:
- Works only with DataFrames that have integer indices.
- DataFrames to be persisted should not have ':' in column names
Establish hbase connection using happybase and write the dataframe.
import happybase
import numpy as np
import pandas as pd
import pdhbase as pdh
connection = None
try:
connection = happybase.Connection('127.0.0.1')
connection.open()
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['f'] = 'hello world'
pdh.to_hbase(df, connection, 'sample_table', 'df_key', cf='cf')
finally:
if connection:
connection.close()
Establish hbase connection using happybase and read the dataframe.
import happybase
import numpy as np
import pandas as pd
import pdhbase as pdh
connection = None
try:
connection = happybase.Connection('127.0.0.1')
connection.open()
df = pdh.read_hbase(connection, 'sample_table', 'df_key', cf='cf')
print df
finally:
if connection:
connection.close()