Commit 0f8a080
## What's Changed
Fix `ListVector`/`LargeListVector` IPC serialization when `valueCount`
is 0.
### Problem
When `valueCount == 0`, `setReaderAndWriterIndex()` was setting
`offsetBuffer.writerIndex(0)`, which means `readableBytes() == 0`. IPC
serializer uses `readableBytes()` to determine buffer size, so 0 bytes
were written to the IPC stream. This crashes IPC readers in other
libraries because Arrow spec requires offset buffer to have at least one
entry `[0]`.
@viirya:
> The offset buffers are allocated properly. But during IPC
serialization, they are ignored.
> ```
> public long readableBytes() {
> return writerIndex - readerIndex;
> }
> ```
> So when ListVector.setReaderAndWriterIndex() sets writerIndex(0) and
readerIndex(0), readableBytes() returns 0 - 0 = 0.
>
> Then when MessageSerializer.writeBatchBuffers() calls
WriteChannel.write(buffer), it writes 0 bytes.
>
> So the flow is:
>
> valueCount=0 → ListVector.setReaderAndWriterIndex() sets
offsetBuffer.writerIndex(0)
> VectorUnloader.getFieldBuffers() returns the buffer with writerIndex=0
> MessageSerializer.writeBatchBuffers() writes the buffer
> WriteChannel.write(buffer) checks buffer.readableBytes() which is 0
> 0 bytes are written to the IPC stream
> PyArrow read the batch with the missing buffer → crash when other
libraries to read
### Fix
Simplify `setReaderAndWriterIndex()` to always use `(valueCount + 1) *
OFFSET_WIDTH` for offset buffer's `writerIndex`. When `valueCount == 0`,
this correctly sets `writerIndex` to `OFFSET_WIDTH`, ensuring
`offset[0]` is included in serialization.
### Testing
Added tests for nested empty lists verifying offset buffer has correct
`readableBytes()`.
Closes #343.
---------
Co-authored-by: Yicong Huang <yicong.huang+data@databricks.com>
1 parent ccaac9a commit 0f8a080
File tree
4 files changed
+50
-4
lines changed- vector/src
- main/java/org/apache/arrow/vector/complex
- test/java/org/apache/arrow/vector
4 files changed
+50
-4
lines changedLines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
312 | | - | |
313 | 312 | | |
314 | 313 | | |
315 | | - | |
316 | 314 | | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
317 | 320 | | |
318 | 321 | | |
319 | 322 | | |
| |||
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
267 | 267 | | |
268 | 268 | | |
269 | 269 | | |
270 | | - | |
271 | 270 | | |
272 | 271 | | |
273 | | - | |
274 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
275 | 278 | | |
276 | 279 | | |
277 | 280 | | |
| |||
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1100 | 1100 | | |
1101 | 1101 | | |
1102 | 1102 | | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
1103 | 1123 | | |
1104 | 1124 | | |
1105 | 1125 | | |
| |||
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1379 | 1379 | | |
1380 | 1380 | | |
1381 | 1381 | | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
| 1392 | + | |
| 1393 | + | |
| 1394 | + | |
| 1395 | + | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
1382 | 1402 | | |
1383 | 1403 | | |
1384 | 1404 | | |
| |||
0 commit comments