Make pagination simpler? #640

jgeewax · 2015-06-06T15:49:48Z

(Going to use Datastore as the example)

In the docs from Datastore.runQuery

var query = dataset.createQuery('Lion');

// Retrieve 5 companies.
dataset.runQuery(query, function(err, entities, endCursor, apiResponse) {
  // Use `endCursor` as the starting cursor for your next query.
  var nextQuery = query.start(endCursor);
  var callback = function(err, entities, endCursor, apiResponse) {};
  dataset.runQuery(nextQuery, callback);
});

Seems like in datastore there's no way to set the page size, however even so, it seems that paging is super manual here, which bums me out.

What if we could make it so that...

var query = dataset.createQuery('Lion');
dataset.runQuery(query, function(err, entities, nextPage) {
  // entities is the current page.
  nextPage(); // Call this same callback function, with the next page of entities?
});

Or even more explicitly, let the user tell you "just get them all please":

var query = dataset.createQuery('Lion').setPagination(false);
dataset.runQuery(query, function(err, entities) {
  // entities is the current page.
});

?

Let me know if I'm a crazy person here... @ryanseys @stephenplusplus We're having a similar discussion about pagination being explicit versus implicit on googleapis/google-cloud-python#895

The text was updated successfully, but these errors were encountered:

stephenplusplus · 2015-06-06T15:59:54Z

Definitely not a crazy person. I would like this to be simpler as well.

runQuery takes a query which can have a limit:

var query = dataset.createQuery('Lion').limit(10);
dataset.runQuery(query, function(err, results, nextQuery) {});

To get all of the results, runQuery can actually be run as a stream:

dataset.runQuery(query).on('data', function(entity) {
  // entity.key = ...
  // entity.data = ...
});

I don't like returning a function in the callback. The modular approach we have now is imo the best way to go. We return a prepared query that the user sends back to runQuery if they want more results.

However, I think something like setPagination(false) would be awesome.

jgeewax · 2015-06-06T16:18:10Z

I didn't notice the .on('data', ...) method in the docs under runQuery -- maybe we should have a separate issue to make that more prominent...... Does that let you short-circuit out when you're done? Ie, if my query matched 500 results, but I wanted to just find the first one and then stop looking, can I do that?

Either way, it doesn't fix the issue of "I know this is going to be a boatload of results. I want to page through them. Please don't make me sad while doing it."

runQuery takes a query which can have a limit

That's an overall limit, not a per-page limit. There's no way (AFAIK) to say "I want up to 500 results, in chunks of 10 at a time please". We just get chunks of N back (I don't know how big N is...)

The modular approach we have now is imo the best way to go.

That makes me sad. Let's look at this again, assuming I can have unlimited results coming back, I want to page through them all, but I might want to stop once I've found the one I want:

(Disclaimer -- I have no idea if this even works -- but the fact that I'm having a bit of trouble getting this to do what I want is... telling. Either I'm not so hot in Javascript, or the pattern is a bit tricky.)

var query = dataset.createQuery('Person');

var callback = function(err, entities, endCursor, apiResponse) {
  for (var i in entities) {
    var entity = entities[i];
    if (entity['name'] == 'Stephen') {
      console.log('Found him!');
      return;
    }
  }

  // Whoops -- didn't find him yet.. let's go again.
  var nextQuery = query.start(endCursor);
  dataset.runQuery(nextQuery, callback);
}

dataset.runQuery(query, callback);

I can't define the callback in a simple way in-line, can I ?

I'd totally much rather

var query = dataset.createQuery('Person');
dataset.runQuery(query, function(err, entities, nextPage) {
  for (var i in entities) {
    var entity = entities[i];
    if (entity['name'] == 'Stephen') {
      console.log('Found him!');
      return;  // I'm done -- no more need to page through these results.
    }
  }
  nextPage();
});

Where nextPage calls the same function again with the next page of results -- I don't need to jump through any hoops or even care about the cursor. I never care about the cursor -- I just want the next page please.

Couldn't we also make nextPage have a property called startCursor so that people could just use the nextPage.startCursor property if they wanted to do this the other way? But nextPage would be callable if they "just want the callback to run again with the next page" ?

stephenplusplus · 2015-06-06T16:29:06Z

I can't address all of the issues at the moment, but just wanted to show some code.

How it works currently:

var query = dataset.createQuery('Person');

var callback = function(err, entities, nextQuery) {
  for (var i in entities) {
    var entity = entities[i];
    if (entity['name'] == 'Stephen') {
      console.log('Found him!');
      return;
    }
  }

  // Whoops -- didn't find him yet.. let's go again.
  dataset.runQuery(nextQuery, callback);
}

dataset.runQuery(query, callback);

I don't mind either way. At one point, I suggested the nextPage() concept and was shut down. Don't remember why.

P.S. good luck finding me!

jgeewax · 2015-06-06T16:30:52Z

Yea that seems super repetitive and makes me sad. I think we should reopen the discussion of nextPage....

ryanseys · 2015-06-08T04:13:45Z

So rather than:

var query = dataset.createQuery('Person');
var callback = function(err, entities, nextQuery) {
  // check if found, if so then return
  // ...
  // Else whoops -- didn't find him yet.. let's go again.
  dataset.runQuery(nextQuery, callback);
}

dataset.runQuery(query, callback);

We have:

var query = dataset.createQuery('Person');
var callback = function(err, entities, nextPage) {
  // check if found, if so then return
  // ...
  // Else whoops -- didn't find him yet.. let's go again.
  nextPage();
}

dataset.runQuery(query, callback);

That doesn't seem to make things much better, and now I'm unsure what happens once nextPage() returns because with no callback it looks like a synchronous call... I like the explicit passing of callback because it feels less magical, and as you can see in the first example above, it's really not all that more complicated.

jgeewax · 2015-06-08T11:40:45Z

Wait, I think you're missing something. You can leave the second option's function anonymous and in-line it without repeating yourself...:

var query = dataset.createQuery('Person');
dataset.runQuery(query, function(err, entities, nextPage) {
  // check if found, if so then return
  // ...
  // Else whoops -- didn't find him yet.. let's go again.
  nextPage();
});

Which is much nicer than the previous IMO.

I'm unsure what happens once nextPage() returns because with no callback it looks like a synchronous call

nextPage just calls whatever method you passed with the next page of entities passed to entities. It's no more synchronous than calling dataset.runQuery(nextQuery, callback)...

I like the explicit passing of callback because it feels less magical, and as you can see in the first example above, it's really not all that more complicated.

It feels awfully repetitive to me, and requires me to change my pattern if I want to deal with more than one page. That is: if you want a single page of results (which we don't define how big a "page" is), in-line the callback function. If you want more than that... well, you have to pull your callback function out, name it, and then make sure you call the next query at the end of it, with itself as the callback.

Note that our documentation only shows inlined callbacks -- aka, all of our documentation shows people how to deal with either 1 or 2 pages only -- never "page until you're done".

If we were to in-line this to "page until you're done", taking the example from the runQuery page again:

var query = dataset.createQuery('Lion');

// Retrieve 5 companies.
transaction.runQuery(query, function(err, entities, endCursor, apiResponse) {
  console.log("First page");
  var callback = function(err, entities, endCursor, apiResponse) {
    console.log("Next n pages...");
    var nextQuery = query.start(endCursor);
    transaction.runQuery(nextQuery, callback);
  };
  // Use `endCursor` as the starting cursor for your next query.
  var nextQuery = query.start(endCursor);
  transaction.runQuery(nextQuery, callback);
});

Which looks stupidly repetitive just to page through the results... It feels to me like we can do much better while still providing the level of control we want.

stephenplusplus · 2015-06-08T13:30:57Z

It's not:

var query = dataset.createQuery('Lion');

// Retrieve 5 companies.
transaction.runQuery(query, function(err, entities, endCursor, apiResponse) {
  console.log("First page");
  var callback = function(err, entities, endCursor, apiResponse) {
    console.log("Next n pages...");
    var nextQuery = query.start(endCursor);
    transaction.runQuery(nextQuery, callback);
  };
  // Use `endCursor` as the starting cursor for your next query.
  var nextQuery = query.start(endCursor);
  transaction.runQuery(nextQuery, callback);
});

It's:

var query = dataset.createQuery('Lion');

// Retrieve 5 companies.
var callback = function(err, entities, nextQuery, apiResponse) {
  if (nextQuery) {
    transaction.runQuery(nextQuery, callback);
  }
};

transaction.runQuery(query, callback);

We return an entire query that is ready to be passed to runQuery.

I'm with Ryan that a function call that doesn't take a callback is sync. That's just how every function call in JavaScript is, so it's in a JS programmer's memory. I think we could say "Well, not for us!" and do it our own way, but I don't know if that's even better in this case. Inline function vs named function isn't that much of a simplification win in my opinion.

I think we can find a happy middle point by allowing an option on the query to get all of the results before executing the callback. For the user who doesn't care, and doesn't want to deal with streams, that seems perfect.

Back to pagination, I was once told Datastore basically decided on the fly how many results it could return based on available resources at the time of request. So, we couldn't chunk results in a defined manner.

jgeewax · 2015-06-08T13:38:16Z

We return an entire query that is ready to be passed to runQuery.

If that's true, then our docs are broken :( See http://googlecloudplatform.github.io/gcloud-node/#/docs/master/datastore/dataset?method=runQuery

Inline function vs named function isn't that much of a simplification win in my opinion.

For me it's huge -- it means rearranging code, coming up with a name for the callback, and changing the flow of things. Instead of "ok runQuery -> here's the callback" it's exactly backwards "Here's a callback, now runQuery with it".

One thing that might be useful would be to provide some more context, so I don't have to pre-define and name my callback:

var query = dataset.createQuery('Lion');

transaction.runQuery(query, function(err, entities, nextQuery, apiResponse) {
  if (nextQuery) {
    transaction.runQuery(nextQuery, this.callback); // or just `this` ?
  }
});

stephenplusplus · 2015-06-08T13:45:57Z

Our library shouldn't try to simplify the language, that's when things are magical and confusing. Naming a function that's going to be re-used makes sense. If they want it to run once, inline and set "runAllNextQueriesForMe: true".

But no, our docs aren't broken. Just my head. I thought for sure we returned nextQuery, but now I found this: #288 - If I'm remembering right, we can't say for sure if there are more results available until you call the API again and find out. So, returning nextQuery that may return 0 results was confusing, so it was decided to return the endcursor to let the user build the query themselves if they choose.

jgeewax · 2015-06-08T13:55:24Z

Naming a function that's going to be re-used makes sense.

My argument is that it's not being reused in the way that I would name myMagicFilter or humanize -- it's still anonymous, just an anonymous function that we are making users call multiple times because our API doesn't let them choose the page size...

When we're not allowing anyone to authoritatively get all the results and call my anonymous function for those results and stop running when I want to stop (.on('data', ...) doesn't let me stop, right?), we need to provide a way to accommodate people (IMO).

But no, our docs aren't broken. Just my head.

Cool - no problem.

@idosela just suggested something interesting. What if we provided a runQueryUntil method? The implication being that "this method will run until you return false, which tells us to stop".

So the code might look like:

var query = dataset.createQuery('Person');
transaction.runQueryUntil(query, function(err, entities, apiResponse) {
  if (someMatchOn(entities)) {
    console.log('Found him!');
    return false;
  }
});

or

var query = dataset.createQuery('Person');
transaction.runQueryUntil(query, function(err, entities, done, apiResponse) {
  if (someMatchOn(entities)) {
    console.log('Found him!');
    done();
  }
});

Maybe we could introduce this into the .on() method?

var query = dataset.createQuery('Person');
transaction.runQuery(query).on('data', function(entity) {
  if (entity.name == 'Stephen') {
    return false;
  }
});

If we introduce that -- I'd recommend that the on() method become the recommended way to deal with data (rather than forcing people to construct next-page queries and paginate manually, define their callback, etc.

jgeewax · 2015-06-08T13:59:31Z

Re the nextQuery being passed... I don't think that endCursor fixes the issue and don't understand the logic here. We can never know if things are done... so why not at least make it convenient? (I'm voting for nextQuery being passed, and always passed, which may end up being run and returning zero results...)

stephenplusplus · 2015-06-08T14:26:40Z

When running in stream mode, you can end it with this.end() from an .on('data', callback) callback or:

var queryStream = dataset.runQuery(query);
// ... later
queryStream.end();

That's all we do when the results are fully retrieved: https://github.com/GoogleCloudPlatform/gcloud-node/blob/d2883f6b64c9d9d372e6d4da683654664aa39a08/lib/datastore/request.js#L559

I'm voting for nextQuery being passed, and always passed, which may end up being run and returning zero results...

I'm on board. nextQuery is just a query after all-- the results are unknown until you run it.

RE: runQueryUntil, thanks for the idea @idosela! We can just promote the stream mode for this case. Even for the user without a lot of stream experience, this is dead simple to understand:

dataset.runQuery(query).on('data', function (entity) {
  if (entity.something) {
    // got it!
    this.end();
  }
});

We can definitely update our docs to show this type of flow.

My argument is that it's not being reused in the way that I would name myMagicFilter or humanize -- it's still anonymous, just an anonymous function that we are making users call multiple times because our API doesn't let them choose the page size...

Even if they could choose the page size, they would still have to control if and when to get the next page's results, right?

And I don't think we agree on the anonymous usage here. It's named so that they can call it again, if they choose. Here's what we're looking at:

// 1.
var onResultSet = function(err, results, nextQuery) {
  if (results[0].name === 'JJ') {
    console.log('Found JJ!');
    return;
  }

  dataset.runQuery(nextQuery, onResultSet);
};

var query = dataset.createQuery('Person');
dataset.runQuery(query, onResultSet);

// 2.
var query = dataset.createQuery('Person');
dataset.runQuery(query, function(err, results, continueRunningQuery) {
  if (results[0].name === 'JJ') {
    console.log('Found JJ!');
    return;
  }

  continueRunningQuery();
});

But as I said before, I'm fine with either. As a user, I would prefer to write in the first style. The second one is truly more confusing for me. I wouldn't expect a function to re-run when I execute some other variable. That feels like we're trying to out-smart the language, and sure, we can try, but for a developer who writes in that language, they get confused by this. They know how to name functions and call them again.

If we go with the second option, a good variable name in place of nextQuery for the docs would be necessary here (I used continueRunningQuery above, but that's a little lengthy).

jgeewax · 2015-06-08T15:00:00Z

I am 100% on board for the following:

dataset.runQuery(query).on('data', function (entity) {
  if (entity.something) {
    // got it!
    this.end();
  }
});

It is clear, concise, and makes pagination not really an issue.

And I don't think we agree on the anonymous usage here.

The point I was making was that option 2 above means I can define the logic in-line, no need to choose a name. If I was to use something in multiple places (as in I want to use it in multiple places), then I should be required to name it. I don't want to use it multiple times here -- I'm forced to because of an implementation detail in our API. That IMO means I shouldn't have to name it...

stephenplusplus · 2015-06-08T15:18:09Z

Worth noting, we use the nextQuery pattern quite a bit in the library, e.g. storage#getBuckets, bucket#getFiles, bigquery.table#getRows.

The most sane and modular solution to this problem is the one where we hand back a prepared query. The user can run that query and provide an entirely different callback if they wanted / queue it to run later / whatever.

But with both of our preferred solutions to this problem, the user can do the same exact things. Who can help make the final call?

jgeewax · 2015-06-08T16:31:13Z

Who can help make the final call?

Are you referring to the call between sending back endCursor or nextQuery ? If so, my vote is for nextQuery -- but the implication is (just like the cursor) you can't just check whether it's null without the final request. When entities comes back empty, nextQuery should come back as null...

stephenplusplus · 2015-06-08T16:33:23Z

Oh sorry, I meant as far as providing a function to get the next results vs a Query object with the cursor set for the user to continue paging.

Everything in your last comment I'm +1 on.

jgeewax · 2015-06-08T16:45:37Z

I think if we provide (and recommend) that the way you page through results is with .on('data', ...) -- and that you can short-circuit out with this.end(), then we can keep the old way with the callback.

🤖 I have created a release \*beep\* \*boop\* --- ## [6.2.0](https://www.github.com/googleapis/nodejs-translate/compare/v6.1.0...v6.2.0) (2021-04-07) ### Features * added v3beta1 proto for online and batch document translation ([#639](https://www.github.com/googleapis/nodejs-translate/issues/639)) ([513c21a](https://www.github.com/googleapis/nodejs-translate/commit/513c21aabf2ece278708bca56a94bde33e80f072)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

jgeewax added enhancement core labels Jun 6, 2015

stephenplusplus mentioned this issue Jun 25, 2015

streamrouter: implement across library #692

Merged

stephenplusplus closed this as completed in #692 Jul 15, 2015

JustinBeckwith assigned stephenplusplus Feb 1, 2021

sofisl mentioned this issue Oct 11, 2022

migrate code from googleapis/nodejs-language #3395

Merged

4 tasks

sofisl pushed a commit that referenced this issue Oct 11, 2022

test(nodejs): remove 15 add 16 (#1322) (#640)

1228ab1

sofisl pushed a commit that referenced this issue Oct 13, 2022

test(nodejs): remove 15 add 16 (#1322) (#640)

f838332

sofisl mentioned this issue Nov 10, 2022

migrate code from googleapis/nodejs-translate #3520

Merged

4 tasks

sofisl pushed a commit that referenced this issue Nov 17, 2022

chore(samples): create instance from template samples (#640)

14952a5

sofisl mentioned this issue Nov 17, 2022

migrate code from googleapis/nodejs-compute #3666

Merged

4 tasks

sofisl pushed a commit that referenced this issue Jan 24, 2023

chore: run CI on Node.js 12, fix GCS file location (#640)

4baa91e

sofisl pushed a commit that referenced this issue Jan 25, 2023

chore: run CI on Node.js 12, fix GCS file location (#640)

62e8035

takeratta mentioned this issue Jul 6, 2023

[Snyk] Security upgrade @google-cloud/logging from 1.2.0 to 6.0.0 takeratta/google-cloud-node#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pagination simpler? #640

Make pagination simpler? #640

jgeewax commented Jun 6, 2015

stephenplusplus commented Jun 6, 2015

jgeewax commented Jun 6, 2015

stephenplusplus commented Jun 6, 2015

jgeewax commented Jun 6, 2015

ryanseys commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

Make pagination simpler? #640

Make pagination simpler? #640

Comments

jgeewax commented Jun 6, 2015

stephenplusplus commented Jun 6, 2015

jgeewax commented Jun 6, 2015

stephenplusplus commented Jun 6, 2015

jgeewax commented Jun 6, 2015

ryanseys commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015

stephenplusplus commented Jun 8, 2015

jgeewax commented Jun 8, 2015