Skip to content

Releases: pathwaycom/pathway

v0.15.3

07 Nov 07:12
Compare
Choose a tag to compare

Added

  • pw.io.mongodb.write connector for writing Pathway tables in MongoDB.
  • pw.io.s3.read now supports downloading objects from an S3 bucket in parallel.

Changed

  • pw.io.fs.read performance has been improved for directories containing a large number of files.

v0.15.2

25 Oct 04:07
Compare
Choose a tag to compare

Added

  • pw.io.deltalake.read now supports custom S3 Delta Lakes with HTTP endpoints.
  • pw.io.deltalake.read now supports specifying both a custom endpoint and a custom region for Delta Lakes via pw.io.s3.AwsS3Settings.

Changed

  • Indices in pathway.stdlib.indexing.nearest_neighbors can now work also on numpy arrays. Previously they only accepted list[float]. Working with numpy arrays improves memory efficiency.
  • pw.io.s3.read has been optimized to minimize new object requests whenever possible.
  • It is now possible to set the size limit of cache in pw.udfs.DiskCache.
  • State persistence now uses a single backend for both metadata and stream storage. The pw.persistence.Config.simple_config method is therefore deprecated. Now you can use the pw.persistence.Config constructor with the same parameters that were previously used in simple_config.

Fixed

  • pw.io.bigquery.write connector now correctly handles pw.Json columns.

v0.15.1

04 Oct 10:06
Compare
Choose a tag to compare

Fixed

  • pw.temporal.session and pw.temporal.asof_join now correctly works with multiple entries with the same time.
  • Fixed an issue in pw.stdlib.indexing where filters would cause runtime errors while using HybridIndexFactory.

v0.15.0

12 Sep 07:21
Compare
Choose a tag to compare

Added

  • Experimental A pw.xpacks.llm.document_store.DocumentStore to process and index documents.
  • pw.xpacks.llm.servers.DocumentStoreServer used to expose REST server for retrieving documents from pw.xpacks.llm.document_store.DocumentStore.
  • pw.xpacks.stdlib.indexing.HybridIndex used for querying multiple indices and combining their results.
  • pw.io.airbyte.read now also supports streams that only operate in full_refresh mode.

Changed

  • Running servers for answering queries is extracted from pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer into pw.xpacks.llm.servers.QARestServer and pw.xpacks.llm.servers.QASummaryRestServer.
  • BREAKING: query and query_as_of_now of pathway.stdlib.indexing.data_index.DataIndex now produce an empty list instead of None if no match is found

v0.14.3

22 Aug 07:57
Compare
Choose a tag to compare

Fixed

  • pw.io.deltalake.read and pw.io.deltalake.write now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.

Added

  • The Pathway CLI command spawn can now execute code directly from a specified GitHub repository.
  • A new CLI command, spawn-from-env, has been added. This command runs the Pathway CLI spawn command using arguments provided in the PATHWAY_SPAWN_ARGS environment variable.

v0.14.2

06 Aug 17:05
Compare
Choose a tag to compare

Fixed

  • Switched pw.xpacks.llm.embedders.GeminiEmbedder to be sync to resolve compatibility issues with the Google Colab runs.
  • Pinned surya-ocr module version for stability.

v0.14.1

05 Aug 10:32
Compare
Choose a tag to compare

Added

  • pw.xpacks.llm.embedders.GeminiEmbedder which is a wrapper for Google Gemini Embedding services.

v0.14.0

25 Jul 20:50
Compare
Choose a tag to compare

Fixed

  • pw.debug.table_to_pandas now exports int | None columns correctly.

Changed

  • pw.io.airbyte.read can now be used with Airbyte connectors implemented in Python without requiring Docker.
  • BREAKING: UDFs now verify the type of returned values at runtime. If it is possible to cast a returned value to a proper type, the values is cast. If the value does not match the expected type and can't be cast, an error is raised.
  • BREAKING: pw.reducers.ndarray reducer requires input column to either have type float, int or Array.
  • pw.xpacks.llm.parsers.OpenParse can now extract and parse images & diagrams from PDFs. This can be enabled by setting the parse_images. processing_pipeline can be also set to customize the post processing of doc elements.

v0.13.2

08 Jul 20:53
Compare
Choose a tag to compare

Added

  • pw.io.deltalake.read now supports S3 data sources.
  • pw.xpacks.llm.parsers.ImageParser which allows parsing images with the vision LMs.
  • pw.xpacks.llm.parsers.SlideParser that enables parsing PDF and PPTX slides with the vision LMs.
  • pw.xpacks.llm.parsers.question_answering.RAGClient, Python client for Pathway hosted RAG apps.
  • pw.xpacks.llm.parsers.question_answeringDeckRetriever, a RAG app that enables searching through slide decks with visual-heavy elements.

Fixed

  • pw.xpacks.llm.vector_store.VectorStoreServer now uses new indexes.

Changed

  • pw.xpacks.llm.parsers.OpenParse now supports any vision Language model including local and propriety models via LiteLLM.

v0.13.1

27 Jun 10:31
Compare
Choose a tag to compare

Added

  • pw.io.kafka.read now accepts an autogenerate_key flag. This flag determines the primary key generation policy to apply when reading raw data from the source. You can either use the key from the Kafka message or have Pathway autogenerate one.
  • pw.io.deltalake.read input connector that fetches changes from DeltaLake into a Pathway table.
  • pw.xpacks.llm.parsers.OpenParse which allows parsing tables and images in PDFs.

Fixed

  • All S3 input connectors (including S3, Min.io, Digital Ocean, and Wasabi) now automatically retry network operations if a failure occurs.
  • The issue where the connection to the S3 source fails after partially ingesting an object has been resolved by downloading the object in full first.