-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
into(Spark/HDFS, SQL DBs) #31
Comments
FWIW, Spark can use the JDBC connector to do computation. It looks like this: sc = SparkContext(...)
sql = HiveContext(...)
df = sql.load("jdbc", url="jdbc:postgresql:dbserver", dbtable="schema.tablename") I don't think this would be that tricky. The function would look something like this: @append.register(SQLContext, sa.Table)
def sql_to_sparksql(ctx, tb, **kwargs):
url = connection_string_from_engine(tb.bind) # <- not implemented, but i think easy
dbtable = [tb.name]
if tb.schema is not None:
dbtable.insert(0, tb.schema)
return ctx.load('jdbc', url=url, dbtable='.'.join(dbtable)) |
here's the spark docs on using jdbc https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases |
If anyone is interested in implementing this, you'll have to
because according to the travis docs the default install is postgres 9.1
|
The postgres connection string syntax can be found here: https://jdbc.postgresql.org/documentation/91/connect.html one really annoying thing is that it looks like jdbc connection strings aren't standard and can be different for different vendors here's oracle's: http://docs.oracle.com/cd/B28359_01/java.111/b31224/urls.htm#BEIJFHHB |
migration from blaze
blaze/blaze#582
reposting for ease of use:
from @chdoig
As Spark is becoming a popular backend, it's a common use case for people to want to transfer their datasets in DBs to a Spark/HDFS cluster.
It would be nice to have an easy interface for end-users to transfer their tables in DBs to a Cluster.
into(Spark/HDFS, SQL DBs)
A lot of people are talking now about tachyon, maybe worth taking a look:
http://tachyon-project.org/
http://ampcamp.berkeley.edu/big-data-mini-course/tachyon.html
This might be related with @quasiben work on SparkSQL. Maybe a barrier for people to star using SparkSQL is how they should make that transfer since:
But I'm not able to find how you make that connection from existing SQL DBs:
http://spark.apache.org/docs/latest/sql-programming-guide.html
cc: @mrocklin
The text was updated successfully, but these errors were encountered: