feat(java): support get lance schema with field ids by majin1102 · Pull Request #4207 · lance-format/lance

majin1102 · 2025-07-11T10:25:28Z

#4119 would depend this PR to get field ids. Or we can't even construct a unit-test for replaceFieldMetadata

I struggled to consiter whether we need to build a flat LanceField in Java(compared with python thin LanceField). consider this case: I want to config all string type as compressed by zstd.

Thin LanceField would:

get the type from arrow schema
get the field id from lance schema
replace field config

The flat LanceField could be more friendly to eliminate the case that we need to get type from arrow shema and field id from LanceField.

I also noticed there has been a pk config in rust lance field. I could raise another PR to add it into JAVA LanceField if it is stable

majin1102 · 2025-07-11T10:28:01Z

Please take a look when you have time @jackye1995 @yanghua

jackye1995 · 2025-07-11T16:46:27Z

java/core/src/main/java/com/lancedb/lance/schema/LanceField.java

+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class LanceField {


I think we want to be exactly the same as the rust model:

pub struct Field { pub name: String, pub id: i32, // TODO: Find way to move these next three fields to private pub parent_id: i32, pub logical_type: LogicalType, pub metadata: HashMap<String, String>, pub encoding: Option<Encoding>, pub nullable: bool, pub children: Vec<Field>, /// Dictionary value array if this field is dictionary. pub dictionary: Option<Dictionary>, pub storage_class: StorageClass, pub unenforced_primary_key: bool, }

this is missing some fields, and fields like dictionary is not matching exactly.

Sure, I originally thought keeping the same with arrow fields with the additional info like field id could be right but conservative. Let me look deep into the model details.

BTW, how do we know the exactly primary key sequence. like:
primary key(A, B), we need to know A before B right? by field id or what others? @jackye1995

Hi @westonpace.

I didn't see encoding, dictonary, unenforced_primary_key putting in the metadata when converting to an arrow field.

I also noticed you said only field id is needed for lance schema.

What do you think of the upper three aligning with JAVA lance shema, are they unnecessary for schema readers? (especially dictionary, there may be an Array in it. I wish you could elaberase more about this)

Dictionary and encoding are not important (only used for legacy 0.1 format). unenforced_primary_key could be important. We should probably include this.

(especially dictionary, there may be an Array in it. I wish you could elaberase more about this)

In the 0.1 format we implemented dictionary at the table level. We would store indices in the file and use the dictionary from the manifest to decode.

This field is unused in 2.0 (the current default) and 2.1 (the next default) and will be phased out at some point if we drop support for 0.1.

primary key(A, B), we need to know A before B right? by field id or what others? @jackye1995

I think primary key is expressed as a set, no ordering is needed. Because either A, B or B, A uniquely identifies a row.

primary key(A, B), we need to know A before B right? by field id or what others? @jackye1995

I think primary key is expressed as a set, no ordering is needed. Because either A, B or B, A uniquely identifies a row.

Yes, makes sense.....this is datalake, not database...

jackye1995 · 2025-07-11T16:49:30Z

java/core/src/main/java/com/lancedb/lance/schema/LanceField.java

+  private final int id;
+  private final String name;
+  private final boolean nullable;
+  private final ArrowType type;


similar comment, I think we need a logicalType and a class LogicalType, and logical type should have a field name (corresponding to the unnamed string in lance_core::datatypes::LogicalType) to record the string type name.

😰 I'd really like to get rid of lance_core::datatypes::LogicalType from Rust. It's a relic I don't think we need anymore. Originally, we had a slightly different type system (because we couldn't support all Arrow data types), but at this point we implement most of the Arrow type system. I think it's much simpler to just use Arrow data types and error when we hit a type we don't expect.

😰 I'd really like to get rid of lance_core::datatypes::LogicalType from Rust. It's a relic I don't think we need anymore. Originally, we had a slightly different type system (because we couldn't support all Arrow data types), but at this point we implement most of the Arrow type system. I think it's much simpler to just use Arrow data types and error when we hit a type we don't expect.

Yeah, I found map and union types are not supported yet. Are they in the plan? @westonpace
I also believe keeping ArrowType could be more friendly to engines and users so that they don't need to deal with two almost same type systems. What do you think? @jackye1995

Yeah, I found map and union types are not supported yet. Are they in the plan?

I think we will add them when someone needs them. Map will be easy. Union is a little more tricky but we can support.

I think it's much simpler to just use Arrow data types and error when we hit a type we don't expect.

I see, that makes sense. I thought we want to reserve the right to extend the Arrow types with that setup, but if we want to be fully Arrow then let's just use Arrow type.

If we want to extend Arrow we can use Arrow's builtin extension type mechanism.

jackye1995

looks good to me

feat(java): support get lance schema with field ids

ab9b9ca

github-actions bot added enhancement New feature or request java labels Jul 11, 2025

resolve cargo fmt

ce02e3d

eddyxu requested review from Xuanwo, jackye1995 and westonpace July 11, 2025 16:30

jackye1995 reviewed Jul 11, 2025

View reviewed changes

resolve cargo fmt

ce38e16

majin1102 mentioned this pull request Jul 2, 2025

Epic: Improving integration with Java ecosystem #3950

Closed

majin.nathan and others added 3 commits July 15, 2025 16:24

Add more information in java lance schema model

69577f1

Fix cargo fmt

209137f

Fix cargo fmt

a554289

jackye1995 approved these changes Jul 15, 2025

View reviewed changes

jackye1995 merged commit 2b2d2c2 into lance-format:main Jul 15, 2025
8 checks passed

jackye1995 mentioned this pull request Jul 16, 2025

[MemTable] Make unenforced primary key columns ordered #4239

Open

majin1102 deleted the lance_schema branch September 10, 2025 07:08

majin1102 mentioned this pull request Dec 17, 2025

fix(java): support FixedSizeList for java LanceField #5509

Merged

Conversation

majin1102 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

majin1102 commented Jul 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

majin1102 Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

majin1102 Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

majin1102 Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

majin1102 commented Jul 11, 2025 •

edited

Loading

majin1102 Jul 12, 2025 •

edited

Loading

majin1102 Jul 12, 2025 •

edited

Loading

majin1102 Jul 12, 2025 •

edited

Loading