Proposed format change: Add files to IndexMetadata
#5456
Replies: 3 comments 1 reply
-
|
This seems like a reasonable idea to me. I suppose the primary downside is a very minor increase in the manifest size? Since index files are typically not per-fragment I think the increase would be fairly negligible. |
Beta Was this translation helpful? Give feedback.
-
|
+1! We recently added the base_id in Index metadata, so all the index files listed here will just inherit that base, I don't think we need per-file base yet and we an always add that later. https://github.com/lance-format/lance/blob/main/protos/table.proto#L282 |
Beta Was this translation helpful? Give feedback.
-
|
One second thought @wjones127 , so far all the index file names are pre-defined as a part of the spec already. For example, we expect a file with name exactly It is basically redundant to have the IndexFile repeating the file path. If the goal is to just have stats of the files for faster access, would it make more sense to just have additional stats fields in each index that knows what files are available? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Currently, the table format doesn't know about any specific files that are part of an index. An index is given a directory, with it's UUID as the name, and it is expected to put all files in there.
It's up to the plugin to define how to open that index. To cleanup the index, we just list the directory and delete all objects within.
Proposal
We can add a message to the IndexMetadata:
This gives us the exact size of the files within the index, which provides two benefits:
Backwards compatibility
If
index_filesis empty, we can interpret this information as missing. Index readers can call HEAD as usual to get the size of files. Index stats can use a list operation as a fallback to get on-disk size of indices.Index file paths
All index files are still required to be under a UUID directory. They can be at different base paths. So the full path of an index file would be in the format:
Beta Was this translation helpful? Give feedback.
All reactions