Skip to content

Push-based model for consuming (realtime) GBFS data #630

Open

Description

What is the issue and why is it an issue?

Using poll-based consumption (the current situation) for real-time data has several challenges.

  • The specification states that real-time feeds should be updated as often as possible.
    • This means they should have a ttl (time-to-live) value of 0. Ideally, this means consumers should poll infinitely often, to stay up-to-date.
    • This is, of course, not possible. Consumers will necessarily poll on a finite interval. Choosing the appropriate interval will depend on various factors, mostly related to available computing and bandwidth resources, as well as the total number of feeds that the consumer needs to poll.

Consumer side

There is an inherent conflict within this decision process: Consumers don’t want to poll too infrequently, because that increases the likelihood that data will be stale and that incorrect information is shown to users.

At the same time, polling too frequently is a potential waste of resources, depending on how often data is refreshed. They may also face rate-limiting policies from producers (I have first-hand experience with this).

In the end, we have to decide between over-fetching and stale data, and it will never be better than a mere compromise.

Producer side

Frequent polling of large-size payloads hogs resources and pushes producers to introduce complexity like caching and CDNs. Having consumers poll at an interval close to 0 seconds is resource-intensive and costly for the data producer, and they face the risk of lost revenue if consumers poll too in-frequently.

We must further consider that large-size payloads often only contains minor changes to the totality of the information, causing an additional waste of resources as non-changes have to be computed.

Cloud computing contributes to greenhouse gas emissions on a massive scale. Allocated resources are generally underutilised and unnecessary computing is extremely wasteful on the financial side, as well as damaging on the environmental side.

Potential solutions

I would like to open up for a community discussion on how to solve this challenge by generic and scalable means. Individual arrangements between consumers and producers are not sustainable and finding a common solution will benefit the community as whole and help the standard grow.

I don’t want to constrain the solutions from the outset, but I think potential solutions fall into the following 3 broad categories:

  1. Continue to use a polling-based model but encourage better use of cache headers and not-modified responses.
  2. Use a push-based model without an intermediary, with technologies like WebSocket or Server-Sent Events
  3. Use a push-based model using an intermediary message broker, with technologies like amqp, pub/sub, kafka, mqtt etc.

Personally, I think the second category holds the right trade-off between added complexity and added value. In particular Server-Sent Events seems to be promising as a theoretical extension of existing endpoints. It should also be noted that options 1 and 2 can co-exist. I.e. producers can continue to support the polling-based method for real time feeds, and improve upon it, while at the same time support a push-based model.

Still, there is another axis to consider: For any given update, what is the size of the delta of that update. There is potentially a very large upside to precompute and only ship what has actually changed, rather than always transferring everything. On the other hand, it requires us to introduce new semantics to communicate to consumers the contents of the delta. E.g. what has been added, what has changed and what was removed.

I’m looking forward to hearing what the community has to say about this. I will use your feedback to work on a proposal for a standard way to deal with the problems outlined here.

Is your potential solution a breaking change?

  • Yes
  • No
  • Unsure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions