Skip to content

Commit cfcff49

Browse files
authored
GH-35352: [Java] Fix issues with "semi complex" types. (#35353)
### Rationale for this change "semi complex" types like `TimeStamp*TZ`, `Duration`, and `FixedSizeBinary` were missing implementations in `UnionListWriter`, `UnionVector`, `UnionReader` and other associated classes. This patch adds these missing methods so that these types can now be written to things like `ListVector`s, whereas before it would throw an exception because the methods were just not implemented. For example, without this patch, one of the new tests added would fail: ``` TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter. ``` There are also fixes for get and set methods for holders for the respective `*Vector`s classes for these types: - The get methods did not set fields like `TimeStampMilliTZHolder.timezone`, `DurationHolder.unit`, `FixedSizeBinaryHolder.byteWidth`. - The set methods did not all validate that those fields matched what the vector's `ArrowType` was set to. For example `TimeStampMilliTZHolder.timezone` should match `ArrowType.Timestamp.timezone` on the vector and should throw if it doesn't. Otherwise users would never get a signal that there is anything wrong with their code writing these holders with mismatching values. This patch additionally marks some of the existing interfaces for writing these "semi complex" types as deprecated, because they do not actually provide enough context to properly write these parameterized `ArrowTypes`. Instead, users should use the write methods that take `*Holders` that do provide enough context for the writers to properly write these types. Also note that the equivalent write methods for `Decimal` have already also been marked as deprecated, for the same reasoning. ### What changes are included in this PR? - Various template changes. See this gist for a diff of the generated files: https://gist.github.com/henrymai/03b0f5a4165cd9822aa797c89bbd74e9 - Note that the diff for the ComplexWriter.java generated ones is not included since that change is easy to see from just the template. - Additional tests to verify that this is fixed. ### Are these changes tested? Yes, added unit tests and they pass. ### Are there any user-facing changes? Yes, users will now be able to write to types like TimeStamp*TZ, Duration, and FixedSizeBinary on ListVector, MapVector, StructVector due to added implementations for these types on UnionListVector, UnionWriter. This also marks the non-holder write interfaces for these types as `@ Deprecated`. The interfaces themselves have not been removed, so this part is not a breaking change. Also as mentioned above, this follows in the footsteps of what `Decimal*` has already done, where they marked the non holder write interfaces as `@ Deprecated`. Additionally another user visible change is that when reading these types into a holder all of the fields are now populated. For example `TimeStamp*TZHolder.timezone` is now populated when it was not before. **This PR includes breaking changes to public APIs.** We now throw an exception when some of the holder fields that correspond to the Vector's ArrowType parameters do not match. This is a breaking change because previously this would silently accept the writes but was actually incorrect, since the element written does not correspond to the right ArrowType variant due to mismatching parameters (like ArrowType.Timestamp.unit, ArrowType.Duration.unit, and ArrowType.FixedSizeBinary.byteWidth). * Closes: #35352 Authored-by: Henry Mai <henrymai@users.noreply.github.com> Signed-off-by: David Li <li.davidm96@gmail.com>
1 parent 6d3fe6b commit cfcff49

19 files changed

+909
-149
lines changed

java/vector/src/main/codegen/templates/AbstractPromotableFieldWriter.java

Lines changed: 72 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@
2525

2626
<#include "/@includes/vv_imports.ftl" />
2727

28+
<#function is_timestamp_tz type>
29+
<#return type?starts_with("TimeStamp") && type?ends_with("TZ")>
30+
</#function>
31+
2832
/*
2933
* A FieldWriter which delegates calls to another FieldWriter. The delegate FieldWriter can be promoted to a new type
3034
* when necessary. Classes that extend this class are responsible for handling promotion.
@@ -105,17 +109,7 @@ public void endEntry() {
105109

106110
<#list vv.types as type><#list type.minor as minor><#assign name = minor.class?cap_first />
107111
<#assign fields = minor.fields!type.fields />
108-
<#if minor.class != "Decimal" && minor.class != "Decimal256">
109-
@Override
110-
public void write(${name}Holder holder) {
111-
getWriter(MinorType.${name?upper_case}).write(holder);
112-
}
113-
114-
public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
115-
getWriter(MinorType.${name?upper_case}).write${minor.class}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
116-
}
117-
118-
<#elseif minor.class == "Decimal">
112+
<#if minor.class == "Decimal">
119113
@Override
120114
public void write(DecimalHolder holder) {
121115
getWriter(MinorType.DECIMAL).write(holder);
@@ -156,8 +150,75 @@ public void writeBigEndianBytesToDecimal256(byte[] value, ArrowType arrowType) {
156150
public void writeBigEndianBytesToDecimal256(byte[] value) {
157151
getWriter(MinorType.DECIMAL256).writeBigEndianBytesToDecimal256(value);
158152
}
153+
<#elseif is_timestamp_tz(minor.class)>
154+
@Override
155+
public void write(${name}Holder holder) {
156+
ArrowType.Timestamp arrowTypeWithoutTz = (ArrowType.Timestamp) MinorType.${name?upper_case?remove_ending("TZ")}.getType();
157+
// Take the holder.timezone similar to how PromotableWriter.java:write(DecimalHolder) takes the scale from the holder.
158+
ArrowType.Timestamp arrowType = new ArrowType.Timestamp(arrowTypeWithoutTz.getUnit(), holder.timezone);
159+
getWriter(MinorType.${name?upper_case}, arrowType).write(holder);
160+
}
159161
162+
/**
163+
* @deprecated
164+
* The holder version should be used instead otherwise the timezone will default to UTC.
165+
* @see #write(${name}Holder)
166+
*/
167+
@Deprecated
168+
@Override
169+
public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
170+
ArrowType.Timestamp arrowTypeWithoutTz = (ArrowType.Timestamp) MinorType.${name?upper_case?remove_ending("TZ")}.getType();
171+
// Assumes UTC if no timezone is provided
172+
ArrowType.Timestamp arrowType = new ArrowType.Timestamp(arrowTypeWithoutTz.getUnit(), "UTC");
173+
getWriter(MinorType.${name?upper_case}, arrowType).write${minor.class}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
174+
}
175+
<#elseif minor.class == "Duration">
176+
@Override
177+
public void write(${name}Holder holder) {
178+
ArrowType.Duration arrowType = new ArrowType.Duration(holder.unit);
179+
getWriter(MinorType.${name?upper_case}, arrowType).write(holder);
180+
}
160181
182+
/**
183+
* @deprecated
184+
* If you experience errors with using this version of the method, switch to the holder version.
185+
* The errors occur when using an untyped or unioned PromotableWriter, because this version of the
186+
* method does not have enough information to infer the ArrowType.
187+
* @see #write(${name}Holder)
188+
*/
189+
@Deprecated
190+
@Override
191+
public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
192+
getWriter(MinorType.${name?upper_case}).write${minor.class}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
193+
}
194+
<#elseif minor.class == "FixedSizeBinary">
195+
@Override
196+
public void write(${name}Holder holder) {
197+
ArrowType.FixedSizeBinary arrowType = new ArrowType.FixedSizeBinary(holder.byteWidth);
198+
getWriter(MinorType.${name?upper_case}, arrowType).write(holder);
199+
}
200+
201+
/**
202+
* @deprecated
203+
* If you experience errors with using this version of the method, switch to the holder version.
204+
* The errors occur when using an untyped or unioned PromotableWriter, because this version of the
205+
* method does not have enough information to infer the ArrowType.
206+
* @see #write(${name}Holder)
207+
*/
208+
@Deprecated
209+
@Override
210+
public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
211+
getWriter(MinorType.${name?upper_case}).write${minor.class}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
212+
}
213+
<#else>
214+
@Override
215+
public void write(${name}Holder holder) {
216+
getWriter(MinorType.${name?upper_case}).write(holder);
217+
}
218+
219+
public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
220+
getWriter(MinorType.${name?upper_case}).write${minor.class}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
221+
}
161222
</#if>
162223
163224
</#list></#list>

java/vector/src/main/codegen/templates/ComplexWriters.java

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@
3232
3333
<#include "/@includes/vv_imports.ftl" />
3434
35+
<#function is_timestamp_tz type>
36+
<#return type?starts_with("TimeStamp") && type?ends_with("TZ")>
37+
</#function>
38+
3539
/*
3640
* This class is generated using FreeMarker on the ${.template_name} template.
3741
*/
@@ -191,7 +195,15 @@ public void writeNull() {
191195
public interface ${eName}Writer extends BaseWriter {
192196
public void write(${minor.class}Holder h);
193197
194-
<#if minor.class?starts_with("Decimal")>@Deprecated</#if>
198+
<#if minor.class?starts_with("Decimal") || is_timestamp_tz(minor.class) || minor.class == "Duration" || minor.class == "FixedSizeBinary">
199+
/**
200+
* @deprecated
201+
* The holder version should be used instead because the plain value version does not contain enough information
202+
* to fully specify this field type.
203+
* @see #write(${minor.class}Holder)
204+
*/
205+
@Deprecated
206+
</#if>
195207
public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>);
196208
<#if minor.class?starts_with("Decimal")>
197209

@@ -201,6 +213,13 @@ public interface ${eName}Writer extends BaseWriter {
201213

202214
public void writeBigEndianBytesTo${minor.class}(byte[] value, ArrowType arrowType);
203215

216+
/**
217+
* @deprecated
218+
* Use either the version that additionally takes in an ArrowType or use the holder version.
219+
* This version does not contain enough information to fully specify this field type.
220+
* @see #writeBigEndianBytesTo${minor.class}(byte[], ArrowType)
221+
* @see #write(${minor.class}Holder)
222+
*/
204223
@Deprecated
205224
public void writeBigEndianBytesTo${minor.class}(byte[] value);
206225
</#if>

java/vector/src/main/codegen/templates/StructWriters.java

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@
3838
import org.apache.arrow.vector.complex.reader.FieldReader;
3939
import org.apache.arrow.vector.complex.writer.FieldWriter;
4040
41+
<#function is_timestamp_tz type>
42+
<#return type?starts_with("TimeStamp") && type?ends_with("TZ")>
43+
</#function>
44+
4145
/*
4246
* This class is generated using FreeMarker and the ${.template_name} template.
4347
*/
@@ -314,7 +318,22 @@ public void end() {
314318
} else {
315319
if (writer instanceof PromotableWriter) {
316320
// ensure writers are initialized
321+
<#if minor.class?starts_with("Decimal")>
317322
((PromotableWriter)writer).getWriter(MinorType.${upperName}<#if minor.class?starts_with("Decimal")>, new ${minor.arrowType}(precision, scale, ${vectName}Vector.TYPE_WIDTH * 8)</#if>);
323+
<#elseif is_timestamp_tz(minor.class) || minor.class == "Duration" || minor.class == "FixedSizeBinary">
324+
<#if minor.arrowTypeConstructorParams??>
325+
<#assign constructorParams = minor.arrowTypeConstructorParams />
326+
<#else>
327+
<#assign constructorParams = [] />
328+
<#list minor.typeParams?reverse as typeParam>
329+
<#assign constructorParams = constructorParams + [ typeParam.name ] />
330+
</#list>
331+
</#if>
332+
ArrowType arrowType = new ${minor.arrowType}(${constructorParams?join(", ")});
333+
((PromotableWriter)writer).getWriter(MinorType.${upperName}, arrowType);
334+
<#else>
335+
((PromotableWriter)writer).getWriter(MinorType.${upperName});
336+
</#if>
318337
}
319338
}
320339
return writer;

java/vector/src/main/codegen/templates/UnionListWriter.java

Lines changed: 42 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@
3737
import static org.apache.arrow.memory.util.LargeMemoryUtil.checkedCastToInt;
3838
<#include "/@includes/vv_imports.ftl" />
3939
40+
<#function is_timestamp_tz type>
41+
<#return type?starts_with("TimeStamp") && type?ends_with("TZ")>
42+
</#function>
43+
4044
/*
4145
* This class is generated using freemarker and the ${.template_name} template.
4246
*/
@@ -103,55 +107,31 @@ public void setPosition(int index) {
103107
super.setPosition(index);
104108
}
105109
106-
<#list vv.types as type><#list type.minor as minor><#assign name = minor.class?cap_first />
107-
<#assign fields = minor.fields!type.fields />
108-
<#assign uncappedName = name?uncap_first/>
109-
<#if uncappedName == "int" ><#assign uncappedName = "integer" /></#if>
110-
<#if !minor.typeParams?? >
111-
110+
<#list vv.types as type><#list type.minor as minor>
111+
<#assign lowerName = minor.class?uncap_first />
112+
<#if lowerName == "int" ><#assign lowerName = "integer" /></#if>
113+
<#assign upperName = minor.class?upper_case />
114+
<#assign capName = minor.class?cap_first />
115+
<#assign vectName = capName />
112116
@Override
113-
public ${name}Writer ${uncappedName}() {
117+
public ${minor.class}Writer ${lowerName}() {
114118
return this;
115119
}
116120
121+
<#if minor.typeParams?? >
117122
@Override
118-
public ${name}Writer ${uncappedName}(String name) {
119-
structName = name;
120-
return writer.${uncappedName}(name);
123+
public ${minor.class}Writer ${lowerName}(String name<#list minor.typeParams as typeParam>, ${typeParam.type} ${typeParam.name}</#list>) {
124+
return writer.${lowerName}(name<#list minor.typeParams as typeParam>, ${typeParam.name}</#list>);
121125
}
122126
</#if>
123-
</#list></#list>
124-
125-
@Override
126-
public DecimalWriter decimal() {
127-
return this;
128-
}
129-
130-
@Override
131-
public DecimalWriter decimal(String name, int scale, int precision) {
132-
return writer.decimal(name, scale, precision);
133-
}
134-
135-
@Override
136-
public DecimalWriter decimal(String name) {
137-
return writer.decimal(name);
138-
}
139-
140-
@Override
141-
public Decimal256Writer decimal256() {
142-
return this;
143-
}
144127
145128
@Override
146-
public Decimal256Writer decimal256(String name, int scale, int precision) {
147-
return writer.decimal256(name, scale, precision);
148-
}
149-
150-
@Override
151-
public Decimal256Writer decimal256(String name) {
152-
return writer.decimal256(name);
129+
public ${minor.class}Writer ${lowerName}(String name) {
130+
structName = name;
131+
return writer.${lowerName}(name);
153132
}
154133
134+
</#list></#list>
155135
156136
@Override
157137
public StructWriter struct() {
@@ -240,18 +220,6 @@ public void end() {
240220
inStruct = false;
241221
}
242222
243-
@Override
244-
public void write(DecimalHolder holder) {
245-
writer.write(holder);
246-
writer.setPosition(writer.idx()+1);
247-
}
248-
249-
@Override
250-
public void write(Decimal256Holder holder) {
251-
writer.write(holder);
252-
writer.setPosition(writer.idx()+1);
253-
}
254-
255223
@Override
256224
public void writeNull() {
257225
if (!listStarted){
@@ -261,65 +229,53 @@ public void writeNull() {
261229
}
262230
}
263231
264-
public void writeDecimal(long start, ArrowBuf buffer, ArrowType arrowType) {
265-
writer.writeDecimal(start, buffer, arrowType);
266-
writer.setPosition(writer.idx()+1);
267-
}
268-
269-
public void writeDecimal(long start, ArrowBuf buffer) {
270-
writer.writeDecimal(start, buffer);
232+
<#list vv.types as type>
233+
<#list type.minor as minor>
234+
<#assign name = minor.class?cap_first />
235+
<#assign fields = minor.fields!type.fields />
236+
<#assign uncappedName = name?uncap_first/>
237+
@Override
238+
public void write${name}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
239+
writer.write${name}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
271240
writer.setPosition(writer.idx()+1);
272241
}
273242
274-
public void writeDecimal(BigDecimal value) {
275-
writer.writeDecimal(value);
243+
<#if is_timestamp_tz(minor.class) || minor.class == "Duration" || minor.class == "FixedSizeBinary">
244+
@Override
245+
public void write(${name}Holder holder) {
246+
writer.write(holder);
276247
writer.setPosition(writer.idx()+1);
277248
}
278249

279-
public void writeBigEndianBytesToDecimal(byte[] value, ArrowType arrowType){
280-
writer.writeBigEndianBytesToDecimal(value, arrowType);
281-
writer.setPosition(writer.idx() + 1);
282-
}
283-
284-
public void writeDecimal256(long start, ArrowBuf buffer, ArrowType arrowType) {
285-
writer.writeDecimal256(start, buffer, arrowType);
250+
<#elseif minor.class?starts_with("Decimal")>
251+
public void write${name}(long start, ArrowBuf buffer, ArrowType arrowType) {
252+
writer.write${name}(start, buffer, arrowType);
286253
writer.setPosition(writer.idx()+1);
287254
}
288255

289-
public void writeDecimal256(long start, ArrowBuf buffer) {
290-
writer.writeDecimal256(start, buffer);
256+
@Override
257+
public void write(${name}Holder holder) {
258+
writer.write(holder);
291259
writer.setPosition(writer.idx()+1);
292260
}
293261

294-
public void writeDecimal256(BigDecimal value) {
295-
writer.writeDecimal256(value);
262+
public void write${name}(BigDecimal value) {
263+
writer.write${name}(value);
296264
writer.setPosition(writer.idx()+1);
297265
}
298266

299-
public void writeBigEndianBytesToDecimal256(byte[] value, ArrowType arrowType){
300-
writer.writeBigEndianBytesToDecimal256(value, arrowType);
267+
public void writeBigEndianBytesTo${name}(byte[] value, ArrowType arrowType){
268+
writer.writeBigEndianBytesTo${name}(value, arrowType);
301269
writer.setPosition(writer.idx() + 1);
302270
}
303-
304-
305-
<#list vv.types as type>
306-
<#list type.minor as minor>
307-
<#assign name = minor.class?cap_first />
308-
<#assign fields = minor.fields!type.fields />
309-
<#assign uncappedName = name?uncap_first/>
310-
<#if !minor.typeParams?? >
271+
<#else>
311272
@Override
312-
public void write${name}(<#list fields as field>${field.type} ${field.name}<#if field_has_next>, </#if></#list>) {
313-
writer.write${name}(<#list fields as field>${field.name}<#if field_has_next>, </#if></#list>);
314-
writer.setPosition(writer.idx()+1);
315-
}
316-
317273
public void write(${name}Holder holder) {
318274
writer.write${name}(<#list fields as field>holder.${field.name}<#if field_has_next>, </#if></#list>);
319275
writer.setPosition(writer.idx()+1);
320276
}
277+
</#if>
321278

322-
</#if>
323279
</#list>
324280
</#list>
325281
}

0 commit comments

Comments
 (0)