Skip to content

How do I write code that talks to two different datasets with two different sets of credentials? #659

Closed
@jgeewax

Description

@jgeewax

Let's say I have:

  • /creds1.json
  • /creds2.json
  • dataset1
  • dataset2
  • gcloud.datastore

How do I pull down Person:1 from dataset1, retrieve the 'name' property, and write it back to Log:2 in dataset2?

Here's my best guess so far:

from gcloud import credentials
from gcloud import datastore

creds1 = credentials.get_for_service_account_json('/creds1.json')
creds2 = credentials.get_for_service_account_json('/creds2.json')

connection1 = datastore.Connection(credentials=creds1)
connection2 = datastore.Connection(credentials=creds1)

person1_key = datastore.Key('Person', 1, dataset_id='dataset1')
log2_key = datastore.Key('Log', 2, dataset_id='dataset2')

person1 = datastore.get(datastore.Key(person1_key), connection=connection1, dataset_id='dataset1')
log2 = datastore.get(datastore.Key(log2_key), connection=connection2, dataset_id='dataset2')

log2['data'] = person1['name']
datastore.put(log2, connection=connection2, dataset_id='dataset2')

I only got that by digging through tons of code. It made me sad.

What I want to write:

from gcloud.datastore import get_connection

dataset1 = get_connection(credentials_json='/creds1.json').get_dataset('dataset1')
dataset2 = get_connection(credentials_json='/creds2.json').get_dataset('dataset2')

person1 = dataset1.get('Person', 1)
log2 = dataset2.get('Log', 2)

log2['data'] = person1['name']
log2.put()

It seems that somewhere along the way, we lost the "hierarchy" of high-level concepts (Datastore -> Connection -> Dataset -> Entity) so that things don't seem to know who their "parent" is in the tree on the way up.

This means things like dataset.get() because it needs to be provided it's connection and credentials. It seems we've tried to overcome this by storing defaults, but that blows up when you have more than one set of credentials...

Maybe I'm totally misunderstanding?


I'm thinking that it'd be cool if we could allow three things:

  1. datastore. that accepts all the parameters to be absurdly specific (here is the connection, here is the dataset_it, etc)
  2. datastore. that has lots of Nones as default parameters, and we "go get the default" if you left it as None (ie, connection=None -> get_default_connection())
  3. Datastore -> Connection -> Dataset -> Entity drill down that "pre-fills" these things going up the chain. That is, I can say:
connection = datastore.get_connection(...)
dataset = connection.get_dataset(...)
entity = dataset.get(...)

This means that the datastore module, Connection, and Dataset all would likely have the same methods, just with fewer things you can specify because some of those fieldsare specified when you "ask for" the next level down (ie, a Dataset knows it's dataset_id, so you don't have an option to provide that) .


Example

datastore.py:

def get(key, connection=None, dataset_id=None):
  connection = connection or get_default_connection()
  dataset_id = dataset_id or get_default_dataset_id()

# Means I can do: datastore.get(Key('Person', 1), dataset_id='dataset1')

dataset.py:

class Dataset(object):

  def __init__(self, dataset_id, connection=None):
    self.dataset_id = dataset_id
    self.connection = connection

  def get(self, key):
    return datastore.get(key, dataset_id=self.dataset_id, connection=self.connection)

/cc @pcostell @dhermes @tseaver

Metadata

Metadata

Assignees

Labels

api: datastoreIssues related to the Datastore API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions