Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-Pagination #496

Merged
merged 13 commits into from
Sep 18, 2018
Merged

Auto-Pagination #496

merged 13 commits into from
Sep 18, 2018

Conversation

rattrayalex-stripe
Copy link
Contributor

@rattrayalex-stripe rattrayalex-stripe commented Sep 11, 2018

Addresses #225

Work remaining:

  • Add tests for a variety of edge-cases, including:
    • Actually reaching the end of a list (where has_more: false).
    • Errors being thrown in a variety of places being handled correctly.
    • Attempts to use more than one API at the same time (eg; having a .then on the .list() and its .autoPagingEach(), passing a callback and attempting to use the result in an iterator, etc).
  • Try to verify that I added autoPageable to all the places that I should have (I just grepped list:).
  • Try to come up with a .autoPagingToArray helper of some kind.

cc @ob-stripe @jlomas-stripe @brandur-stripe – this is ready for preliminary eyes if you're interested. I included some refactor work, so viewing commit-by-commit may aid your sanity.

@jlomas-stripe
Copy link
Contributor

Try to come up with a .autoPagingToArray helper of some kind.

What does that mean? Is this to have some kind of 'output' from the whole thing? If so, I think it's probably not worth doing, tbh; if someone wants to collect 'em all, they can do so, but per comments in 225, I think it's a bit risky for us to keep track of this - and I don't think any of the other libraries do either?

@rattrayalex-stripe
Copy link
Contributor Author

Yeah, I feel that. Pretty on the fence myself.

I'm basically picturing a lot of users writing their own wrappers to do exactly this, though, which makes me cringe a bit, especially given that it's a lot less straightforward/pleasant than in any of the other languages.

Plus, for-loops and iteratively pushing into an array have (somewhat rightfully) fallen out of style in the JS community, in favor of .filter/.map-style code, which without a helper is unachievable here.

As a first pass implementation, I have:

      requestPromise.autoPagingToArray = function(onDone) {
        var promise = new Promise(function(resolve) {
          var items = [];
          requestPromise.autoPagingEach(function(item) {
            items.push(item);
          }).then(function() {
            resolve(items);
          });
        });
        return self.wrapTimeout(promise, onDone);
      }

... though I'm leaning towards a mandatory {max: number} arg, perhaps with a cap of 10,000 or something, to limit the downside of runaway memory use / api abuse.

Frankly, looking at the above code, I think we'd ~need to either include a copy/pastable version of that in our docs, or just include the helper directly.

@rattrayalex-stripe
Copy link
Contributor Author

rattrayalex-stripe commented Sep 11, 2018

Okay, added an implementation of autoPagingToArray for easier discussion.

Using the tests as an example, comparing the ~simplest case of autoPagingEach:

var customerIds = [];
function onCustomer(customer) {
customerIds.push(customer.id);
if (customerIds.length >= LIMIT) {
return false;
}
}
function onDone(err) {
resolve(customerIds);
}
stripe.customers.list({limit: 3}).autoPagingEach(onCustomer, onDone);

to the equivalent with autoPagingToArray:
stripe.customers.list({limit: 3}).autoPagingToArray({max: LIMIT})
.then(function (customers) {
return customers.map(function(customer) { return customer.id; });
})
.then(resolve)

And also looking at the 10-line boilerplate wrapper we'd need to recommend for casual use, I'm personally leaning towards including this in some form. My impression is that none of our other languages have such non-idiomatic/ugly code required to for happy-path usecases.

EDIT: for example, node requires two callbacks (or a callback and a promise) to handle each item and then do stuff when it's done. In every other language, you can just write code after an idiomatic for-loop, or in ruby's case just chain after auto_paging_each.

I'm assuming here that a large percentage of paginated list requests against the Stripe API have fewer than 10k results (eg; ~small users poking around at ~small collections), and that 10k results could easily fit within memory & rate limits. I'm also assuming that users ~frequently open up a node console to poke around at things in an exploratory fashion.

Definitely open to thoughts on this, and bikeshedding re; the API. These are just my hunches.

Refactor out StripeResource#wrapTimeout as utils.callbackifyPromise
@rattrayalex-stripe rattrayalex-stripe force-pushed the ralex/auto-pagination branch 2 times, most recently from 4eb125c to 204aaee Compare September 11, 2018 23:36
@rattrayalex-stripe rattrayalex-stripe force-pushed the ralex/auto-pagination branch 2 times, most recently from c5eb2d6 to 4f410af Compare September 12, 2018 20:08
@rattrayalex-stripe
Copy link
Contributor Author

r? @brandur-stripe @jlomas-stripe
@ob-stripe & @remi-stripe , eager for your 👀 as well!

@jlomas-stripe
Copy link
Contributor

Hey @rattrayalex-stripe! I'll try to take a peek at this today or tomorrow. Thanks so much for taking this on! \o/

@jlomas-stripe
Copy link
Contributor

jlomas-stripe commented Sep 13, 2018

@rattrayalex-stripe I think the test for "lets you call next() to iterate and next(false) to break" (and its siblings) should call next(false) before you reach LIMIT to confirm you can exit; you'll also probably want to have a test to confirm that once it gets to LIMIT that next() won't be called (and done() will).

@rattrayalex-stripe
Copy link
Contributor Author

sweet, thanks @jlomas-stripe !

re; LIMIT, that's a great catch – thanks! Will clean that up.

Separately, re; moving testUtils/ to be a sibling of test, that was because mocha's --recursive mode will try to parse everything in test/, which breaks on most node versions when you have for await syntax in there. --recursive doesn't have any way to exclude files; the recommended backup is to use a glob, but I found that changing --recursive in mocha.opts to test/**/*.spec.js meant that npm run mocha -- test/myTest.spec.js didn't work anymore; it would always run all the tests, since the globs were additive. 😞

I could have moved the .node*.js files into a sibling and left testUtils.js in the same place, but felt it'd be a bit confusing to have both. Don't have strong feels though. Thoughts?

@rattrayalex-stripe
Copy link
Contributor Author

By the way, @romain-stripe had suggested limit: instead of max: for autoPagingToArray. I figured we should first decide whether we really want autoPagingToArray, but also wanted to raise that bikeshed for discussion (I'm ambivalent).

@jlomas-stripe
Copy link
Contributor

Thanks mocha

limit: ~makes sense, though it has another meaning inside list(), so that may be confusing....?

Copy link
Contributor

@brandur-stripe brandur-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Alex!

I'd probably +1 limit over max on autoPagingToArray — even though it's a bit overloaded in being the same name as on list endpoints, the function is similar enough that it's pretty easy to understand. If they were different, it would be a little surprising.

lib/utils.js Outdated
@@ -227,6 +227,19 @@ var utils = module.exports = {
}
return false;
},

callbackifyPromise: function(promise, callback) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the old name for this (wrapTimeout) better? It basically seems to take a callback and put it in a timeout block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it happens to use a timeout to achieve a secondary goal of letting a promise instead call a callback, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, agreed, but it seems like the timeout piece is kind of important here. callbackifyPromise reads like a promise utility function or something to me. Maybe callbackifyPromiseWithTimeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, will do on Monday

@@ -14,6 +14,7 @@ module.exports = StripeResource.extend({
list: stripeMethod({
method: 'GET',
path: 'accounts',
autoPageable: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that we still need to add this parameter to a few dozen other list calls elsewhere right?

The autoPageable parameter seems a little unusual to me given that (1) only lists are pageable, and (2) all lists should be pageable. What do you think about naming this something like methodType: 'list' or something along those lines instead?

Copy link
Contributor Author

@rattrayalex-stripe rattrayalex-stripe Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like I missed methods like listTransactions, etc – looks like there's 13 of those.

I like methodType: 'list' a little better too, let's go with that. My thinking was "what if there's a list that isn't pageable" but that's pretty silly, since we try hard to avoid that (and I don't think there are current instances of such behavior).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var makeRequest = require('./makeRequest');
var utils = require('./utils');

function getItemCallback(args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we could make an effort to organize this file a little bit more? Either:

  1. Functions listed alphabetically.
  2. Functions listed by category with exported on the top (so you drop into the file and see the most important ones first) and unexported below, and alphabetically within each category.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I like (2)!

It's a bit of a smell in JS to call functions before they're defined, imo (it's not possible with arrow functions like const fn = () =>, only function declarations like we have here) but given that we're likely to stay with ES5 source code for some time, I'm happy to make a silver lining out of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw Error('The `onItem` callback function passed to autoPagingEach must accept at most two arguments; got ' + onItem);
}

// API compat; turn this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a little more flavor to this comment? What API is this trying to be compatible with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else if (listResult.has_more) {
// Reset counter, request next page, and recurse.
i = 0;
listPromise = requestNextPage(listResult);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any potential harm to going to an arbitrary stack depth like this? Like if we iterated a thousand pages, do we have a thousand functions sitting on the stack and the potential to overflow? I know some languages do tail call optimization for this kind of situation, and it looks like newer versions of JS have it, but not sure if we should expect it to kick in here or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. I wasn't sure about this but it looks like recursive promises are handled by the promise implementation, not the call stack:

That recursive call is asynchronous and so it does not abuse the call stack by definition.

Which I got from this post, ironically about how the promise implementation leaks memory (fortunately, at rates which don't seem problematic – I think it's not until you get well into the millions at a minimum).

So I think we're good here; thanks for checking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, thanks for looking into that.

});
}

function autoPagingEach(self, requestArgs, spec, firstPagePromise) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are new methods and fairly non-obvious to figure out how to use based on the function signature, could we try adding some basic documentation blocks for autoPagingEach and autoPagingToArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; for users or maintainers? I assume users?

@rattrayalex-stripe
Copy link
Contributor Author

rattrayalex-stripe commented Sep 14, 2018

Okay, most feedback addressed (except for docs which is locally WIP).

Actually, in writing docs for this (which I started in docstrings but would like to move to the readme), I'm leaning towards moving from this:

for await (const customer of stripe.customers.list().autoPagingEach()) {}

to this:

for await (const customer of stripe.customers.list()) {}

which will enable getting rid of a hack or two and should be cleaner for users. I have a WIP on that but will need to finish on Monday.

Thoughts?

@jlomas-stripe
Copy link
Contributor

@rattrayalex-stripe How would that work, exactly? Is it, like ... list() just becomes the auto-paging thing?

@brandur-stripe
Copy link
Contributor

@rattrayalex-stripe How would that work, exactly? Is it, like ... list() just becomes the auto-paging thing?

Was wondering the same thing! :) The invocation certainly looks incredibly clean if we can get there.

@rattrayalex-stripe
Copy link
Contributor Author

rattrayalex-stripe commented Sep 14, 2018

yeah, instead of return utils.callbackifyPromiseWithTimeout(promise, callback) we'll do something like:

const ret = utils.callbackifyPromiseWithTimeout(promise, callback);
if (isList) {
  const asyncIterationNext = factoryThing();
  ret.autoPagingEach = foo(asyncIterationNext)
  ret.autoPagingToArray = bar(asyncIterationNext);
  ret.next = asyncIterationNext;
  ret.return = () => ({});
  ret[Symbol.asyncIterator] = () => ret;
}
return ret;

... which I'll probably wrap into a mutative helper packed in autoPagination.js

@rattrayalex-stripe rattrayalex-stripe changed the title Auto-Pagination [WIP] Auto-Pagination Sep 17, 2018
@rattrayalex-stripe
Copy link
Contributor Author

Alright, I think this is ready for another look!

r? @brandur-stripe @jlomas-stripe

README.md Outdated
@@ -216,6 +216,87 @@ stripe.setAppInfo({

This information is passed along when the library makes calls to the Stripe API.

### Auto-pagination

As of stripe-node 6.11.0, you may auto-paginate list api methods.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I assume I should bump the version separately from this PR; is it safe to assume this'll go out as 6.11.0?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably. We can always fix it later too!

@rattrayalex-stripe
Copy link
Contributor Author

FWIW, I ran a back-of-napkin stripe.charges.list().autoPagingToArray({limit: 10000}) and it completed in ~10 minutes (643306ms) with a resulting object around 20MB. That's not coming very close to anyone's memory limit so I'd feel pretty comfortable raising it to a million or something, but it's also slow enough that if you're doing something that big, you're probably better off parallelizing.

README.md Outdated
@@ -216,6 +216,87 @@ stripe.setAppInfo({

This information is passed along when the library makes calls to the Stripe API.

### Auto-pagination

As of stripe-node 6.11.0, you may auto-paginate list api methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you capitalize to "API" given this is user facing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, actually I think I'll just remove the word since it's used differently than the next sentence: you may auto-paginate list methods. sgty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! SG.

README.md Outdated
### Auto-pagination

As of stripe-node 6.11.0, you may auto-paginate list api methods.
We provide a few different API's for this to aid with a variety of node versions and styles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the nit, but I think we usually use "APIs" (see examples from Google). There's a little disagreement, but without an apostrophe is usually considered the right way to pluralize an acronym.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, yes, thanks!

README.md Outdated

#### Async iterators (`for-await-of`)

If you are in a node environment that has support for [async iteration](https://github.com/tc39/proposal-async-iteration#the-async-iteration-statement-for-await-of),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pluralize "Node" here like it is below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant capitalize 😄 but yes, good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, oops, yes.

@brandur-stripe
Copy link
Contributor

Awesome. Left a few minor comments on README wording, but LGTM.

@jlomas-stripe
Copy link
Contributor

if you're doing something that big, you're probably better off parallelizing.

I wonder if it's worth including some notes either in the code or the documentation about this, and when you might use one approach over the other?

@rattrayalex-stripe
Copy link
Contributor Author

Awesome, thanks for the eagle-eyes @brandur-stripe ! Updated. Does the content feel good?

@jlomas-stripe that's a very good point... I think we should really add docs at /docs/api#pagination to show off the pattern of spinning up multiple month-at-a-time paginators in parallel. I think @remi-stripe has shown this pattern off a few times. Can you think of a clarification to the this readme too? I'm coming up dry 😕

@brandur-stripe
Copy link
Contributor

Awesome, thanks for the eagle-eyes @brandur-stripe ! Updated. Does the content feel good?

Yep, looks great! We can always iterate if we find something missing/unclear later.

@rattrayalex-stripe
Copy link
Contributor Author

Great, thanks!

Sounds like this is approved; marking as such. Will merge & release this afternoon.

@rattrayalex-stripe rattrayalex-stripe merged commit 0e10d1c into master Sep 18, 2018
@rattrayalex-stripe rattrayalex-stripe deleted the ralex/auto-pagination branch May 3, 2019 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants