Releases: lance-format/lance
Releases · lance-format/lance
v6.0.0-beta.1
What's Changed
Breaking Changes 🛠
New Features 🎉
- feat(dictionary-namespace): support table related operation by @zhangyue19921010 in #6308
- feat: clean up transaction files on failed commits by @wjones127 in #6319
- refactor: use exact base-scoped store bindings by @Xuanwo in #6422
- feat: wire batch_size_bytes to Python and public Rust API by @westonpace in #6428
- feat(index): support float16 and float64 in IVF_FLAT by @BubbleCal in #6476
- feat: batch chopping fallback for filtered read by @westonpace in #6482
- feat: add ANN proto codecs and extract table_identifier module by @LuQQiu in #6503
- feat: add configurable blob v2 pack file size by @hamersaw in #6508
Bug Fixes 🐛
- fix: warn and clamp LANCE_INITIAL_UPLOAD_SIZE instead of panicking by @LuciferYang in #6389
- fix: keep delete-by-source fast path with scalar indexes by @Xuanwo in #6435
- fix: include column_metadatas and column_infos in CachedFileMetadata::DeepSizeOf by @jiaoew1991 in #6480
- fix(index): preserve fts prewarm position codec by @BubbleCal in #6485
- fix: handle FlatBin quantization in optimize_vector_indices_v2 by @jackye1995 in #6488
- fix: use logical OR instead of bitwise OR in conflict resolver by @dentiny in #6492
- fix: bump jieba-rs to 0.9.0 to fix build-no-lock CI by @westonpace in #6518
- fix: blob projection schema compatibility by @Xuanwo in #6521
- fix(namespace): serialize manifest mutations by @Xuanwo in #6525
- fix: missing bumpversion entry for lance-tokenizer by @Xuanwo in #6526
Documentation 📚
Performance Improvements 🚀
- perf: intern DataFile fields/column_indices to reduce manifest memory by @beinan in #6477
- perf: intern RowDatasetVersionMeta inline bytes to reduce manifest memory by @beinan in #6499
- perf: submit I/O requests eagerly in FullZipScheduler by @hushengquan in #6513
- perf: add SIMD-accelerated u8 L2 and cosine distance kernels by @justinrmiller in #6517
Full Changelog: release-root/6.0.0-beta.N...v6.0.0-beta.1
v5.1.0-beta.3
What's Changed
Bug Fixes 🐛
- fix: include column_metadatas and column_infos in CachedFileMetadata::DeepSizeOf by @jiaoew1991 in #6480
- fix(index): preserve fts prewarm position codec by @BubbleCal in #6485
- fix: handle FlatBin quantization in optimize_vector_indices_v2 by @jackye1995 in #6488
Full Changelog: v5.1.0-beta.2...v5.1.0-beta.3
v5.1.0-beta.2
What's Changed
New Features 🎉
- feat: clean up transaction files on failed commits by @wjones127 in #6319
- feat: wire batch_size_bytes to Python and public Rust API by @westonpace in #6428
Full Changelog: v5.1.0-beta.1...v5.1.0-beta.2
v5.1.0-beta.1
What's Changed
New Features 🎉
- feat(index): support float16 and float64 in IVF_FLAT by @BubbleCal in #6476
Bug Fixes 🐛
- fix: warn and clamp LANCE_INITIAL_UPLOAD_SIZE instead of panicking by @LuciferYang in #6389
Full Changelog: release-root/5.1.0-beta.N...v5.1.0-beta.1
v5.0.0-rc.1
What's Changed
Breaking Changes 🛠
- refactor!: cleanup namespace related APIs by @jackye1995 in #6186
- feat!: add progress monitoring via callbacks for distributed merge by @vivek-bharathan in #6210
- refactor!: remove staging from distributed vector indexing by @Xuanwo in #6269
- refactor: move DatasetIndexExt out of lance-index by @Xuanwo in #6280
- feat!: support sampling selected fragments by @Xuanwo in #6294
- refactor!: align distributed index build around segments by @Xuanwo in #6313
New Features 🎉
- feat: io_uring based file reader by @westonpace in #5777
- feat: add an arrow-stats crate with the ability to calculate basic stats on arrays by @westonpace in #5967
- feat(java): add non-blocking AsyncScanner with CompletableFuture API by @beinan in #6102
- feat: change default file format version to 2.1 by @westonpace in #6115
- feat(namespace): add count_table_rows, insert_into_table, query_table by @XuQianJin-Stars in #6132
- feat(DirectoryNamespace): support index and transaction related operations by @zhangyue19921010 in #6196
- feat: pluggable index cache via CacheBackend trait by @wjones127 in #6222
- feat: introduce
CacheCodecfor serializing index cache entries by @wjones127 in #6223 - feat: bounding source fragments for compaction execution by @hamersaw in #6232
- fix: filter out detached versions when scanning manifests by @jackye1995 in #6245
- feat: allow setting transaction properties in various operations by @jackye1995 in #6246
- feat: add OpenDAL Azdls backend for abfss:// with use_opendal flag by @burlacio in #6256
- feat: add aimd throttled object store by @westonpace in #6266
- feat: clarify logical indices and physical index segments by @Xuanwo in #6270
- feat: support stop-word gaps in phrase queries by @BubbleCal in #6277
- feat: move rate limiting to the object store by @westonpace in #6293
- feat: support non-shared centroid vector index builds by @Xuanwo in #6296
- feat: add configurable FTS index prewarm options by @BubbleCal in #6298
- feat: add a fast dataset version ID API by @BubbleCal in #6303
- feat(python): add storage_options to IvfModel and PqModel save/load by @hushengquan in #6312
- feat: add write progress callback to InsertBuilder by @wjones127 in #6318
- feat: expose stable row ids by @ivscheianu in #6325
- feat(java): add allowExternalBlobOutsideBases to WriteParams by @beinan in #6330
- feat: support hamming distance in HNSW by @BubbleCal in #6336
- feat: make MAX_MINIBLOCK_VALUES configurable via env var by @westonpace in #6340
- feat: support bf16 from pytorch dataset by @eddyxu in #6342
- feat: support DataFusion Expr in DeleteBuilder by @wjones127 in #6343
- feat: support ingest mode for external blob writes by @Xuanwo in #6356
- feat: support distributed IVF_RQ segment builds by @Xuanwo in #6359
- feat: expose branch_identifier in python and java bindings by @majin1102 in #6360
- feat: object store decides scheduler type by @westonpace in #6373
- feat: support vector query pruning by index segments by @Xuanwo in #6376
- feat: add batch_size_bytes to encoding decode stream by @westonpace in #6388
- feat: thread data_size through decode pipeline by @westonpace in #6391
- feat: support index build progress callbacks in Python bindings by @vivek-bharathan in #6394
- feat: refine logical vector index into an IVF view by @Xuanwo in #6400
- feat: optimize one segmented vector segment per run by @Xuanwo in #6402
- feat: validate DataFile column_indices at commit time by @westonpace in #6414
- feat(java): add Dataset.getZonemapStats() API by @beinan in #6421
- feat: support IVF partitions multi-split by @BubbleCal in #6423
- feat(python): add blob-aware to_pandas by @BubbleCal in #6424
- feat: add DataFile.create helper for building DataFile metadata by @westonpace in #6427
Bug Fixes 🐛
- fix: handle list-level NULLs in NOT filters by @fenfeng9 in #6044
- fix: like queries with a prefix should be accelerated by btree and zonemap by @jackye1995 in #6188
- fix: respect the old data filter on inverted index by @westonpace in #6216
- fix: 2.1/2.2 panic when a list column had small values and many empty values by @westonpace in #6234
- fix: resolve_latest_location converts errors to not_found unconditionally by @wkalt in #6248
- fix: return errors for unsupported fixed-size-list child types by @myandpr in #6253
- fix: adding namespace support to java SDK CommitBuilder from dataset by @hamersaw in #6257
- fix: pass dataset_options to SafeLanceDataset in worker processes by @eddyxu in #6278
- fix(python): preserve index details in python index metadata by @Xuanwo in #6279
- fix: support hamming distance in IndicesBuilder by @jmhsieh in #6295
- fix(namespace): support nested types in convert_json_arrow_type by @jiaoew1991 in #6300
- fix: restore namespace build after DatasetIndexExt move by @Xuanwo in #6302
- fix: multiple improvements for gh workflows by @esteban in #6306
- fix: use StorageOptions::new() in cloud providers to pick up env vars by @westonpace in #6316
- refactor: remove row-id ordering from vector index merge by @Xuanwo in #6332
- fix: avoid full scan for nullable vector fragment sampling by @Xuanwo in #6341
- fix: respect fragment filters during distributed worker training by @Xuanwo in #6358
- fix: ensure durable writes actually wait for WAL flush by @hamersaw in #6368
- fix: return error instead of OOM on corrupt binary page offsets by @westonpace in #6392
- fix: preserve multipart part ordering in throttle wrapper by @westonpace in #6393
- fix: remove legacy tempfile save of IVF centroids in GPU training path by @hushengquan in #6396
- fix: implement RowIdMeta.asdict() and from_dict() for stable row ID serialization by @pengw0048 in #6405
- fix(namespace): remove trailing empty query in table uri by @jackye1995 in #6415
- fix: skip legacy validation for Overwrite operations by @hamersaw in #6418
- fix: implement next page token support in list namespaces and tables by @ayao227 in #6419
- fix: avoid corrupting sub-schema merge_insert pages on v2.2 by @Xuanwo in #6420
- fix(merge_insert): use sentinel column for NULL-safe source row detection by @pratik0316 in #6439
Documentation 📚
- docs: shorten core major release vote window by @Xuanwo in #6154
- docs: add conflict handling and FRI guidance by @westonpace in #6304
- docs(python): prefer uv for local environment setup by @Xuanwo in #6335
- docs: clarify helper function guidance in AGENTS.md by @Xuanwo in #6357
- docs: document DataFile column_indices changes in 2.1 format by @westonpace in #6416
*...
v5.0.0-beta.6
What's Changed
New Features 🎉
- feat: introduce
CacheCodecfor serializing index cache entries by @wjones127 in #6223 - feat(java): add allowExternalBlobOutsideBases to WriteParams by @beinan in #6330
- feat: add batch_size_bytes to encoding decode stream by @westonpace in #6388
- feat: validate DataFile column_indices at commit time by @westonpace in #6414
- feat(java): add Dataset.getZonemapStats() API by @beinan in #6421
- feat: support IVF partitions multi-split by @BubbleCal in #6423
- feat(python): add blob-aware to_pandas by @BubbleCal in #6424
- feat: add DataFile.create helper for building DataFile metadata by @westonpace in #6427
Bug Fixes 🐛
- fix: return error instead of OOM on corrupt binary page offsets by @westonpace in #6392
- fix: implement RowIdMeta.asdict() and from_dict() for stable row ID serialization by @pengw0048 in #6405
- fix(namespace): remove trailing empty query in table uri by @jackye1995 in #6415
- fix: skip legacy validation for Overwrite operations by @hamersaw in #6418
- fix: implement next page token support in list namespaces and tables by @ayao227 in #6419
- fix: avoid corrupting sub-schema merge_insert pages on v2.2 by @Xuanwo in #6420
- fix(merge_insert): use sentinel column for NULL-safe source row detection by @pratik0316 in #6439
Documentation 📚
- docs: document DataFile column_indices changes in 2.1 format by @westonpace in #6416
- docs: add community sync to docs by @ccmao1130 in #6434
Full Changelog: v5.0.0-beta.5...v5.0.0-beta.6
v5.0.0-beta.5
What's Changed
New Features 🎉
- feat(namespace): add count_table_rows, insert_into_table, query_table by @XuQianJin-Stars in #6132
- feat: thread data_size through decode pipeline by @westonpace in #6391
- feat: support index build progress callbacks in Python bindings by @vivek-bharathan in #6394
- feat: refine logical vector index into an IVF view by @Xuanwo in #6400
- feat: optimize one segmented vector segment per run by @Xuanwo in #6402
Bug Fixes 🐛
- fix: remove legacy tempfile save of IVF centroids in GPU training path by @hushengquan in #6396
Other Changes
Full Changelog: v5.0.0-beta.4...v5.0.0-beta.5
v5.0.0-beta.4
What's Changed
Breaking Changes 🛠
- feat!: add progress monitoring via callbacks for distributed merge by @vivek-bharathan in #6210
New Features 🎉
- feat: support ingest mode for external blob writes by @Xuanwo in #6356
- feat: support vector query pruning by index segments by @Xuanwo in #6376
Bug Fixes 🐛
- fix: ensure durable writes actually wait for WAL flush by @hamersaw in #6368
- fix: preserve multipart part ordering in throttle wrapper by @westonpace in #6393
Performance Improvements 🚀
- perf: fix O(N²) version column scan with deletion vectors by @pengw0048 in #6375
Full Changelog: v5.0.0-beta.3...v5.0.0-beta.4
v5.0.0-beta.3
What's Changed
Breaking Changes 🛠
- refactor!: cleanup namespace related APIs by @jackye1995 in #6186
- refactor!: align distributed index build around segments by @Xuanwo in #6313
New Features 🎉
- feat: io_uring based file reader by @westonpace in #5777
- feat: add an arrow-stats crate with the ability to calculate basic stats on arrays by @westonpace in #5967
- feat: change default file format version to 2.1 by @westonpace in #6115
- feat: expose stable row ids by @ivscheianu in #6325
- feat: make MAX_MINIBLOCK_VALUES configurable via env var by @westonpace in #6340
- feat: support bf16 from pytorch dataset by @eddyxu in #6342
- feat: support DataFusion Expr in DeleteBuilder by @wjones127 in #6343
- feat: support distributed IVF_RQ segment builds by @Xuanwo in #6359
- feat: expose branch_identifier in python and java bindings by @majin1102 in #6360
- feat: object store decides scheduler type by @westonpace in #6373
Bug Fixes 🐛
- fix(python): preserve index details in python index metadata by @Xuanwo in #6279
- fix: use StorageOptions::new() in cloud providers to pick up env vars by @westonpace in #6316
- refactor: remove row-id ordering from vector index merge by @Xuanwo in #6332
- fix: avoid full scan for nullable vector fragment sampling by @Xuanwo in #6341
- fix: respect fragment filters during distributed worker training by @Xuanwo in #6358
Documentation 📚
Performance Improvements 🚀
- perf: remove O(n²) performance regression in take() with duplicate indices by @YSBF in #6351
- perf: reduce bitmap index build to 1 bitmap in RAM by @wkalt in #6371
Full Changelog: v5.0.0-beta.2...v5.0.0-beta.3
v5.0.0-beta.2
What's Changed
New Features 🎉
- feat: pluggable index cache via CacheBackend trait by @wjones127 in #6222
- feat: add write progress callback to InsertBuilder by @wjones127 in #6318
- feat: support hamming distance in HNSW by @BubbleCal in #6336
Documentation 📚
- docs: add conflict handling and FRI guidance by @westonpace in #6304
- docs(python): prefer uv for local environment setup by @Xuanwo in #6335
Full Changelog: v5.0.0-beta.1...v5.0.0-beta.2