Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Streaming #437

Open
dylanmcreynolds opened this issue May 12, 2023 · 7 comments
Open

Data Streaming #437

dylanmcreynolds opened this issue May 12, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@dylanmcreynolds
Copy link
Contributor

We recently had an interesting group conversation about streaming data. We discussed infrastructure that might be useful for a wide variety of use cases. I'm writing this in the Tiled repo because a) I want to put it somewhere and b) there may be a tie-in to Tiled. If that turns out not to be the case, I'll happily close this issue and put it somewhere else, but let's start here for now.

Several facilities have expressed interest in streaming data from an instrument to a web browser.

There are a number of technologies that are potentially in use at beamlines for streaming data. Data can be small scalar or large arrays. Examples of these technologies include ZMQ, Kafka, PVA Stream. On the consumer side, it would be handy to provide streaming data in a way that provides a uniform transport protocol and well-known data structure. A central service that can take data from a variety of streaming protocols on one side and translate that to browser-friendly technologies would be a very useful tool to have. Additionally, providing authN/authZ infrastructure could allow this service to be used in remote access scenarios. While browsers are the easiest way to think about web-based technologies, such a service could provide non-browsers clients the ability to access streaming data over the web.

A number of technologies exist to provide server side events to the browser (server side events, websockets). WebSockets seems like a good place to start. It is widely adopted. It seems like firewalling issues that once existed for WebSockets have decreased over time. WebSockets supports two-way communication, which I could possibly see being useful in the far distant future.

Tiled already provides many things that I think we would want:

  • authentication infrastructure
  • authorization infrastructure
  • the facility to provide metadata to the client
  • well-documented structures and a facility to convey them to client

I would like to get a discussion about the viability of creating such a service based on Tiled. Such a project could either be built into Tiled, or its own standalone repo and service that imports Tiled as a dependency.

@dylanmcreynolds dylanmcreynolds added the enhancement New feature or request label May 12, 2023
@danielballan
Copy link
Member

This is compelling. I think it is worth trying to build websockets support into core. We describe Tiled as offering:

  • Slicing ("structured access" generally)
  • Granular access control
  • Transcoding

The last two would certainly apply to streaming data. Slicing applies too, though I suspect the semantics will be a different for streaming vs random access in some important ways.

Perhaps a given node in Tiled could support random access via HTTP and streaming access via web sockets (when applicable). This could include:

  • Streaming documents from a Bluesky run, coming from Kafka
  • Streaming updates from a CA or PVA channel

From a client POV the transition between streaming access and random access from storage at rest would be as smooth as can be: same authenticated session, same access control (authorization) rules, even a similar URL. Just a different protocol.

Months ago, I was initially resistant to this suggestion because I wanted to stay focused on getting the non-steaming use case rock solid. But now I am convinced.

@danielballan
Copy link
Member

Saying a little more on what this could look like:

  • An additional optional method, perhaps stream(), on Tiled adapters which is a generator returning items in the stream. Items would be small numpy arrays or DataFrames, corresponding to the structure_family of the Adapter.
  • A new route, maybe /stream/{path} with websockets, which calls the new method on the adapter, transcodes each item into the requested format, and yields it.

If our initial use case is streaming from a CA or PVA (EPICS) subscription, the Adapter should developed outside of Tiled core.

@danielballan
Copy link
Member

danielballan commented May 12, 2023

Note that it is possible to develop this all out of Tiled core using the include_routers mechanism by which Adapters can add custom routes to the server. If we like it, seems likely we would move the /stream route into core.


Edit: Here is a self-contained example demonstrating the plugin mechanism for adding custom routes. It doesn't do any streaming-related stuff, just custom routes in general. https://gist.github.com/danielballan/476359c7743251582a8302f4794bb8ab

@dylanmcreynolds
Copy link
Contributor Author

This simple adapter interface will allow us to create an test an adapter while the development in Tiled starts up, which is great.

@danielballan
Copy link
Member

Adding a diagram here that was sketched in a discussion about this with @whs92 and colleagues

image

@dylanmcreynolds
Copy link
Contributor Author

I'm a little confused by something on the digram above. You have multiple routes between pymca and tiled. Is the two way http connection for something other than the ws connection?

@danielballan
Copy link
Member

Yes. (And I am very happy for feedback on how to draw it better.)

GET /api/v1/metadata/{path}  # exists as soon as RunStart comes out
GET /api/v1/table/full/{path}  # poll for data during scan, or get all data after
ws /api/v1/stream/{path}  # stream data (while scan is live only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants