-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support a fallback structure family for "opaque bytes" #434
Comments
One possible name for this proposed new structure family is |
Like any other node in Tiled, an It's worth considering the trade-offs of promoting certain special fields in
The only one we will always know is |
We have discussed various bars that data can clear:
Tiled currently insists that you start at (2). When @jmaruland and I first visited NIST, John Henry made the case that we should actually start at (1) --- that we should accept files that we cannot open. That leaves a lot of Tiled's capability on the table. Interfaces like these rely on Tiled being able to open the file and provide it in a known format: https://tiled-demo.blueskyproject.io/ui/browse/generated/short_table Given "unstructured" data, Tiled would have to fall back to showing only a "Download" button and leave it to the client/user to figure out how to open the data. |
I'll add a few more into the mix for consideration...because, well, naming things is hard. :)
I hadn't intended to complicate a simple keyword choice, but for some reason felt compelled to do so anyway. :) |
I think you've convinced me that Of those options I think I like
|
I think the current proposal to beat is: structure_family: unknown
structure:
mimetype: "..." # e.g. "application/octet-stream", "text/plain;chatset=utf-8"
length: ... # number of bytes Maybe |
Agreed. I don’t have strong feelings re: |
Slight preference for |
I had no idea that |
|
That makes sense. Stream has its own baggage of expectations, but "application/octet-stream" is so commonly used that it's hard to argue against. |
We use
Unlike TIFF or PNG or Arrow, the context necessary to interpret the C-ordered buffers (their data type and shape) is not inlined into the payload itself---it's in the For category of use cases addressed by this GH issue, we may actually know a specific MIME type. Use cases include things like Word documents, MATLAB scripts, and PDFs, probably associated with some more structured scientific data. Tiled will not be able to transcode or slice into these nodes, but it can give the client a good hint by saying, "The person who gave me this said it was So my initial reaction is that adding a MIME type like |
OK, I stand corrected. 😆 |
This is getting silly, but the more I think about it, the more I think I like plain old |
That's pretty compelling. |
Simplicity |
It seems like we have a winner. Should we proceed with using |
Let's do it. #450 is a good reference for which parts of the codebase need to be touched to add a new StructureFamily. Some design things to nail down before we write code.
|
Is there any reason that an adapter can't define |
I think we're on the same page. Compare to this array example, which has a
This proposal is that the |
I'd be interested in drafting a PR for this, along with some follow up discussions. It would be great to have a companion for this. @jmaruland are you interested in working on this together? |
@padraic-shafer Yes, I would love to. I worked on a very similar issue a while ago when we were trying to move away from JSONSchema models to Pydantic models. I will be fun to revisit this topic. |
Fantastic! I'll find a time later this week for us to discuss where to start, and how to proceed. |
Follow-up thoughts here:
And there is space for a
|
This is another idea that arose during the NIST visit.
Tiled's data model constrains everything to be one of its recognized structure families (array, dataframe, sparse, node) or JSON-encodable metadata sitting alongside one of those types. There will be cases where there is binary (not JSON-encodable) information that is relevant and that some clients programs will know what to do with.
Our line on this so far has been, "If you have a files, use a static file server or Globus or another file-based solution, and link to that from the metadata in Tiled." And for cases where you have a lot of un-structured data (directory of PowerPoint documents or PDFs) I think think that's the right call. But John Henry at NIST articulated a compelling argument that it is useful to enable Tiled to carry binary data in-line when it's useful.
I think this would take the shape of a new structure family, perhaps
opaque_bytes
. Its structure would simply be a length, and it would be sliceable by byte range. Tiled would not be able to transcode it, only send it in its original representation as bytes. Any context necessary to interpret the bytes would have been either known a priori to a client. The (JSON-encodable)metadata
attached to theopaque_bytes
node may provide helpful information in this regard for a client, but it would be "opaque" to Tiled itself.We are all in agreement that if you have mostly unstructured / opaque data, then Tiled is not adding value and you should just use a static file server. But if you have a little unstructured / opaque data and you want to place it logically alongside structure data, there is an argument that Tiled should enable this.
The text was updated successfully, but these errors were encountered: