Skip to content

Releases: lance-format/lance

v6.0.0-beta.1

16 Apr 02:15

Choose a tag to compare

v6.0.0-beta.1 Pre-release
Pre-release

What's Changed

Breaking Changes 🛠

  • refactor!: vendor the tokenizer stack into lance by @Xuanwo in #6512

New Features 🎉

  • feat(dictionary-namespace): support table related operation by @zhangyue19921010 in #6308
  • feat: clean up transaction files on failed commits by @wjones127 in #6319
  • refactor: use exact base-scoped store bindings by @Xuanwo in #6422
  • feat: wire batch_size_bytes to Python and public Rust API by @westonpace in #6428
  • feat(index): support float16 and float64 in IVF_FLAT by @BubbleCal in #6476
  • feat: batch chopping fallback for filtered read by @westonpace in #6482
  • feat: add ANN proto codecs and extract table_identifier module by @LuQQiu in #6503
  • feat: add configurable blob v2 pack file size by @hamersaw in #6508

Bug Fixes 🐛

  • fix: warn and clamp LANCE_INITIAL_UPLOAD_SIZE instead of panicking by @LuciferYang in #6389
  • fix: keep delete-by-source fast path with scalar indexes by @Xuanwo in #6435
  • fix: include column_metadatas and column_infos in CachedFileMetadata::DeepSizeOf by @jiaoew1991 in #6480
  • fix(index): preserve fts prewarm position codec by @BubbleCal in #6485
  • fix: handle FlatBin quantization in optimize_vector_indices_v2 by @jackye1995 in #6488
  • fix: use logical OR instead of bitwise OR in conflict resolver by @dentiny in #6492
  • fix: bump jieba-rs to 0.9.0 to fix build-no-lock CI by @westonpace in #6518
  • fix: blob projection schema compatibility by @Xuanwo in #6521
  • fix(namespace): serialize manifest mutations by @Xuanwo in #6525
  • fix: missing bumpversion entry for lance-tokenizer by @Xuanwo in #6526

Documentation 📚

  • docs: tighten python environment workflow guidance by @Xuanwo in #6520

Performance Improvements 🚀

  • perf: intern DataFile fields/column_indices to reduce manifest memory by @beinan in #6477
  • perf: intern RowDatasetVersionMeta inline bytes to reduce manifest memory by @beinan in #6499
  • perf: submit I/O requests eagerly in FullZipScheduler by @hushengquan in #6513
  • perf: add SIMD-accelerated u8 L2 and cosine distance kernels by @justinrmiller in #6517

Full Changelog: release-root/6.0.0-beta.N...v6.0.0-beta.1

v5.1.0-beta.3

11 Apr 22:32

Choose a tag to compare

v5.1.0-beta.3 Pre-release
Pre-release

What's Changed

Bug Fixes 🐛

  • fix: include column_metadatas and column_infos in CachedFileMetadata::DeepSizeOf by @jiaoew1991 in #6480
  • fix(index): preserve fts prewarm position codec by @BubbleCal in #6485
  • fix: handle FlatBin quantization in optimize_vector_indices_v2 by @jackye1995 in #6488

Full Changelog: v5.1.0-beta.2...v5.1.0-beta.3

v5.1.0-beta.2

11 Apr 03:56

Choose a tag to compare

v5.1.0-beta.2 Pre-release
Pre-release

What's Changed

New Features 🎉

  • feat: clean up transaction files on failed commits by @wjones127 in #6319
  • feat: wire batch_size_bytes to Python and public Rust API by @westonpace in #6428

Full Changelog: v5.1.0-beta.1...v5.1.0-beta.2

v5.1.0-beta.1

10 Apr 14:53

Choose a tag to compare

v5.1.0-beta.1 Pre-release
Pre-release

What's Changed

New Features 🎉

  • feat(index): support float16 and float64 in IVF_FLAT by @BubbleCal in #6476

Bug Fixes 🐛

  • fix: warn and clamp LANCE_INITIAL_UPLOAD_SIZE instead of panicking by @LuciferYang in #6389

Full Changelog: release-root/5.1.0-beta.N...v5.1.0-beta.1

v5.0.0-rc.1

09 Apr 16:05

Choose a tag to compare

v5.0.0-rc.1 Pre-release
Pre-release

What's Changed

Breaking Changes 🛠

  • refactor!: cleanup namespace related APIs by @jackye1995 in #6186
  • feat!: add progress monitoring via callbacks for distributed merge by @vivek-bharathan in #6210
  • refactor!: remove staging from distributed vector indexing by @Xuanwo in #6269
  • refactor: move DatasetIndexExt out of lance-index by @Xuanwo in #6280
  • feat!: support sampling selected fragments by @Xuanwo in #6294
  • refactor!: align distributed index build around segments by @Xuanwo in #6313

New Features 🎉

  • feat: io_uring based file reader by @westonpace in #5777
  • feat: add an arrow-stats crate with the ability to calculate basic stats on arrays by @westonpace in #5967
  • feat(java): add non-blocking AsyncScanner with CompletableFuture API by @beinan in #6102
  • feat: change default file format version to 2.1 by @westonpace in #6115
  • feat(namespace): add count_table_rows, insert_into_table, query_table by @XuQianJin-Stars in #6132
  • feat(DirectoryNamespace): support index and transaction related operations by @zhangyue19921010 in #6196
  • feat: pluggable index cache via CacheBackend trait by @wjones127 in #6222
  • feat: introduce CacheCodec for serializing index cache entries by @wjones127 in #6223
  • feat: bounding source fragments for compaction execution by @hamersaw in #6232
  • fix: filter out detached versions when scanning manifests by @jackye1995 in #6245
  • feat: allow setting transaction properties in various operations by @jackye1995 in #6246
  • feat: add OpenDAL Azdls backend for abfss:// with use_opendal flag by @burlacio in #6256
  • feat: add aimd throttled object store by @westonpace in #6266
  • feat: clarify logical indices and physical index segments by @Xuanwo in #6270
  • feat: support stop-word gaps in phrase queries by @BubbleCal in #6277
  • feat: move rate limiting to the object store by @westonpace in #6293
  • feat: support non-shared centroid vector index builds by @Xuanwo in #6296
  • feat: add configurable FTS index prewarm options by @BubbleCal in #6298
  • feat: add a fast dataset version ID API by @BubbleCal in #6303
  • feat(python): add storage_options to IvfModel and PqModel save/load by @hushengquan in #6312
  • feat: add write progress callback to InsertBuilder by @wjones127 in #6318
  • feat: expose stable row ids by @ivscheianu in #6325
  • feat(java): add allowExternalBlobOutsideBases to WriteParams by @beinan in #6330
  • feat: support hamming distance in HNSW by @BubbleCal in #6336
  • feat: make MAX_MINIBLOCK_VALUES configurable via env var by @westonpace in #6340
  • feat: support bf16 from pytorch dataset by @eddyxu in #6342
  • feat: support DataFusion Expr in DeleteBuilder by @wjones127 in #6343
  • feat: support ingest mode for external blob writes by @Xuanwo in #6356
  • feat: support distributed IVF_RQ segment builds by @Xuanwo in #6359
  • feat: expose branch_identifier in python and java bindings by @majin1102 in #6360
  • feat: object store decides scheduler type by @westonpace in #6373
  • feat: support vector query pruning by index segments by @Xuanwo in #6376
  • feat: add batch_size_bytes to encoding decode stream by @westonpace in #6388
  • feat: thread data_size through decode pipeline by @westonpace in #6391
  • feat: support index build progress callbacks in Python bindings by @vivek-bharathan in #6394
  • feat: refine logical vector index into an IVF view by @Xuanwo in #6400
  • feat: optimize one segmented vector segment per run by @Xuanwo in #6402
  • feat: validate DataFile column_indices at commit time by @westonpace in #6414
  • feat(java): add Dataset.getZonemapStats() API by @beinan in #6421
  • feat: support IVF partitions multi-split by @BubbleCal in #6423
  • feat(python): add blob-aware to_pandas by @BubbleCal in #6424
  • feat: add DataFile.create helper for building DataFile metadata by @westonpace in #6427

Bug Fixes 🐛

  • fix: handle list-level NULLs in NOT filters by @fenfeng9 in #6044
  • fix: like queries with a prefix should be accelerated by btree and zonemap by @jackye1995 in #6188
  • fix: respect the old data filter on inverted index by @westonpace in #6216
  • fix: 2.1/2.2 panic when a list column had small values and many empty values by @westonpace in #6234
  • fix: resolve_latest_location converts errors to not_found unconditionally by @wkalt in #6248
  • fix: return errors for unsupported fixed-size-list child types by @myandpr in #6253
  • fix: adding namespace support to java SDK CommitBuilder from dataset by @hamersaw in #6257
  • fix: pass dataset_options to SafeLanceDataset in worker processes by @eddyxu in #6278
  • fix(python): preserve index details in python index metadata by @Xuanwo in #6279
  • fix: support hamming distance in IndicesBuilder by @jmhsieh in #6295
  • fix(namespace): support nested types in convert_json_arrow_type by @jiaoew1991 in #6300
  • fix: restore namespace build after DatasetIndexExt move by @Xuanwo in #6302
  • fix: multiple improvements for gh workflows by @esteban in #6306
  • fix: use StorageOptions::new() in cloud providers to pick up env vars by @westonpace in #6316
  • refactor: remove row-id ordering from vector index merge by @Xuanwo in #6332
  • fix: avoid full scan for nullable vector fragment sampling by @Xuanwo in #6341
  • fix: respect fragment filters during distributed worker training by @Xuanwo in #6358
  • fix: ensure durable writes actually wait for WAL flush by @hamersaw in #6368
  • fix: return error instead of OOM on corrupt binary page offsets by @westonpace in #6392
  • fix: preserve multipart part ordering in throttle wrapper by @westonpace in #6393
  • fix: remove legacy tempfile save of IVF centroids in GPU training path by @hushengquan in #6396
  • fix: implement RowIdMeta.asdict() and from_dict() for stable row ID serialization by @pengw0048 in #6405
  • fix(namespace): remove trailing empty query in table uri by @jackye1995 in #6415
  • fix: skip legacy validation for Overwrite operations by @hamersaw in #6418
  • fix: implement next page token support in list namespaces and tables by @ayao227 in #6419
  • fix: avoid corrupting sub-schema merge_insert pages on v2.2 by @Xuanwo in #6420
  • fix(merge_insert): use sentinel column for NULL-safe source row detection by @pratik0316 in #6439

Documentation 📚

  • docs: shorten core major release vote window by @Xuanwo in #6154
  • docs: add conflict handling and FRI guidance by @westonpace in #6304
  • docs(python): prefer uv for local environment setup by @Xuanwo in #6335
  • docs: clarify helper function guidance in AGENTS.md by @Xuanwo in #6357
  • docs: document DataFile column_indices changes in 2.1 format by @westonpace in #6416
    *...
Read more

v5.0.0-beta.6

09 Apr 06:30

Choose a tag to compare

v5.0.0-beta.6 Pre-release
Pre-release

What's Changed

New Features 🎉

  • feat: introduce CacheCodec for serializing index cache entries by @wjones127 in #6223
  • feat(java): add allowExternalBlobOutsideBases to WriteParams by @beinan in #6330
  • feat: add batch_size_bytes to encoding decode stream by @westonpace in #6388
  • feat: validate DataFile column_indices at commit time by @westonpace in #6414
  • feat(java): add Dataset.getZonemapStats() API by @beinan in #6421
  • feat: support IVF partitions multi-split by @BubbleCal in #6423
  • feat(python): add blob-aware to_pandas by @BubbleCal in #6424
  • feat: add DataFile.create helper for building DataFile metadata by @westonpace in #6427

Bug Fixes 🐛

  • fix: return error instead of OOM on corrupt binary page offsets by @westonpace in #6392
  • fix: implement RowIdMeta.asdict() and from_dict() for stable row ID serialization by @pengw0048 in #6405
  • fix(namespace): remove trailing empty query in table uri by @jackye1995 in #6415
  • fix: skip legacy validation for Overwrite operations by @hamersaw in #6418
  • fix: implement next page token support in list namespaces and tables by @ayao227 in #6419
  • fix: avoid corrupting sub-schema merge_insert pages on v2.2 by @Xuanwo in #6420
  • fix(merge_insert): use sentinel column for NULL-safe source row detection by @pratik0316 in #6439

Documentation 📚

Full Changelog: v5.0.0-beta.5...v5.0.0-beta.6

v5.0.0-beta.5

04 Apr 07:28

Choose a tag to compare

v5.0.0-beta.5 Pre-release
Pre-release

What's Changed

New Features 🎉

Bug Fixes 🐛

  • fix: remove legacy tempfile save of IVF centroids in GPU training path by @hushengquan in #6396

Other Changes

  • refactor: rename "region" to "shard" in mem_wal implementation by @hamersaw in #6367

Full Changelog: v5.0.0-beta.4...v5.0.0-beta.5

v5.0.0-beta.4

02 Apr 21:37

Choose a tag to compare

v5.0.0-beta.4 Pre-release
Pre-release

What's Changed

Breaking Changes 🛠

New Features 🎉

  • feat: support ingest mode for external blob writes by @Xuanwo in #6356
  • feat: support vector query pruning by index segments by @Xuanwo in #6376

Bug Fixes 🐛

  • fix: ensure durable writes actually wait for WAL flush by @hamersaw in #6368
  • fix: preserve multipart part ordering in throttle wrapper by @westonpace in #6393

Performance Improvements 🚀

  • perf: fix O(N²) version column scan with deletion vectors by @pengw0048 in #6375

Full Changelog: v5.0.0-beta.3...v5.0.0-beta.4

v5.0.0-beta.3

02 Apr 08:54

Choose a tag to compare

v5.0.0-beta.3 Pre-release
Pre-release

What's Changed

Breaking Changes 🛠

  • refactor!: cleanup namespace related APIs by @jackye1995 in #6186
  • refactor!: align distributed index build around segments by @Xuanwo in #6313

New Features 🎉

Bug Fixes 🐛

  • fix(python): preserve index details in python index metadata by @Xuanwo in #6279
  • fix: use StorageOptions::new() in cloud providers to pick up env vars by @westonpace in #6316
  • refactor: remove row-id ordering from vector index merge by @Xuanwo in #6332
  • fix: avoid full scan for nullable vector fragment sampling by @Xuanwo in #6341
  • fix: respect fragment filters during distributed worker training by @Xuanwo in #6358

Documentation 📚

  • docs: clarify helper function guidance in AGENTS.md by @Xuanwo in #6357

Performance Improvements 🚀

  • perf: remove O(n²) performance regression in take() with duplicate indices by @YSBF in #6351
  • perf: reduce bitmap index build to 1 bitmap in RAM by @wkalt in #6371

Full Changelog: v5.0.0-beta.2...v5.0.0-beta.3

v5.0.0-beta.2

30 Mar 19:12

Choose a tag to compare

v5.0.0-beta.2 Pre-release
Pre-release

What's Changed

New Features 🎉

Documentation 📚

  • docs: add conflict handling and FRI guidance by @westonpace in #6304
  • docs(python): prefer uv for local environment setup by @Xuanwo in #6335

Full Changelog: v5.0.0-beta.1...v5.0.0-beta.2