merge lantern_extras and lantern repos #338

var77 · 2024-10-07T14:42:47Z

modify CI scripts to work with the new project structure

* Added lantern-cli binary and cli option for embedding generation * Update CI/CD to build CLI package * Fix CI env var name * Improve error handling and logging, update README * Make image downloading parallel, update README * Add data using clone * Add more status logs * Fix error messages for image downloads * Update README * Fix output for bge models * Get CLS embeddings from bert models * Refactor and cleanup build/package script * Add schema support, make pk field generic * Update README, bump version * Get approximate count of rows * Add README.md and LICENSE into release package * Update README * Change lantern-cli name to contain architecture and platform * Add schema in table size estimation, make input column value optional

* Add lantern-cli daemon mode * Add memory optimizations and comment code * Add Dockerfile for cli * Add logger, add task id when logging * Fix db exporter * Fix logger.warn * Update README add logger in create index * Add dockerfile for CUDA * Add print_raw method for logger * Update onnxruntime version in README * Update GPU dockerfile * Update README for docker usage * Add update listener, continous embedding generation, startup checks * Fix startup query * Optimize tokenizer * Implement stop handler, optimize code, make fault tolerant * Remove unnecessary clones on references * Set data path based on OS * Add restarts on connection loss * Implement row locking and batching for client table inserts * Rename client to client_jobs * Check write permission on target table, add streaming option in embeddings * Send pending jobs only to update channel * Fix model column name typo * Fix startup job init * Fix init_finished_at update on startup * Add comment * Change gh runner ubuntu version * Update versions and README * Fix updates for rows with non-numeric ids * Update BERT large model to v1.5 * Add more BERT models * Make tokenizer and inputs for bert model dynamic

* Add docker publish action for CLI * Optimize dockerfiles

* Better handle sql quotting, update README for docker run * Bump cli version

* Skip column creation if specified, check permissions from grants, improve error logging for client task * Add schema name to table permission check

* Fix handling names for client identifier * Update cli version

* - Create lock table in lantern schema - Hash client trigger function and trigger names so they won't exceed the character limit * Bump cli version

…38)

* Make row locking non-blocking * Remove unnecessary deref

* Add command to measure model generation speed * Fix GPU memory leak, merge process_text and process_image into one function * Add min/max/avg speeds * Update min time for test * Update version

* Fix data_producer deadloc, remove unwraps * Bump versions

* Index autotune WIP * Add import functionality for lantern index * add index autotune to cli * Update README

* Check memory usage before running model. references #26 * Fix checks for GPU #26 * Add info message #26 * Print more informative error messages * Bump version * Refactor naming * Fix return type * Bump versions

* Fix find_bes_variant, add comments, add existing result reuse functionality * Refactor daemon, add autotune to daemon * Make float to 2 decimal points, change internal schema name * Bump version

…uilds

…r f16 type

… selects optimizations based on CPU in runtime, print hardware acceleration being used, update compiler versions in dockerfiles to clang-18

…ich makes postmaster error if loaded in shared_preload libs

…ts for daemon

…deamon bgworker jobs

… in bgworker

…h the new project structure

var77 and others added 30 commits September 20, 2023 09:04

Change multithreading with channels and portal

a1c6815

Fix case sensitivity for table name

c01e9b7

Add details on pgrx installation

a871568

Turn onnxruntime into a dynamic dl-opened dependency

e57d2a3

Change function argument into const ref

a94240d

Take a non-exclusive lock in the common case on the model params

8c911a4

Add BSL license (#22)

9942102

Nit: Fix formatting (#25)

2254d19

Add docker publish action for CLI (#28)

d505ca7

* Add docker publish action for CLI * Optimize dockerfiles

Add tests for lantern_embeddings and lantern_embeddings core (#27)

91b1e17

Better handling for sql query formatting (#29)

c78a27b

* Better handle sql quotting, update README for docker run * Bump cli version

Varik/skip column creation (#30)

62ff52a

* Skip column creation if specified, check permissions from grants, improve error logging for client task * Add schema name to table permission check

Varik/fix client identifier names (#31)

5a20fd7

* Fix handling names for client identifier * Update cli version

Fix typo in trigger name (#32)

51e4af5

Fix notification channel for client (#33)

dd32004

Add dst_column to trigger and function names (#34)

613b103

Explicitly specify ort version (#36)

279f72a

Create lock table in lantern schema hash trigger names (#37)

fd804df

* - Create lock table in lantern schema - Hash client trigger function and trigger names so they won't exceed the character limit * Bump cli version

Create client function and trigger names with job_id instead of hash (#…

018f804

…38)

Collect update jobs non-blocking (#39)

2e9573c

* Make row locking non-blocking * Remove unnecessary deref

Add progress and usage tracking (#40)

ef4e299

Add model speed measurement script (#41)

e13d286

* Add command to measure model generation speed * Fix GPU memory leak, merge process_text and process_image into one function * Add min/max/avg speeds * Update min time for test * Update version

Fix data_producer deadlock, remove unwraps (#42)

7d53268

* Fix data_producer deadloc, remove unwraps * Bump versions

Add db connect timeout (#43)

4e86a66

Index autotune and import (#44)

8b0858e

* Index autotune WIP * Add import functionality for lantern index * add index autotune to cli * Update README

Optimize batch size for models (#46)

084b28e

Check memory usage to avoid OOM Errors (#45)

d82daa0

* Check memory usage before running model. references #26 * Fix checks for GPU #26 * Add info message #26 * Print more informative error messages * Bump version * Refactor naming * Fix return type * Bump versions

Index Autotune Improvements (#47)

9d334e3

* Fix find_bes_variant, add comments, add existing result reuse functionality * Refactor daemon, add autotune to daemon * Make float to 2 decimal points, change internal schema name * Bump version

var77 added 30 commits September 16, 2024 17:49

write tests for external indexing status server

8d54349

send error message length in case of error

9f6495b

send protocol version on first message

1ca8f5a

fix memory leak in exernal indexing by not shadowing the variable

9277fd2

push containers with cpu-native build for common intel CPUs

8d52242

show indexing speed in logs

84feb25

update cli to 0.3.24

c458731

pass usearch defines to enable simsimd and native f16 for optimized b…

b6df5fe

…uilds

add more target architectures

19c51fa

add quantization support for external indexing

cee7ee9

parse vector elements based on element_bits passed from header

7fba4a9

use usearch add_raw to avoid double type conversion and accelerate fo…

45091e5

…r f16 type

upgrade usearch, remove cpu-optimized build as new version of usearch…

32a70e6

… selects optimizations based on CPU in runtime, print hardware acceleration being used, update compiler versions in dockerfiles to clang-18

revert usearch update, birng back the x64 optimized builds

2c1847e

update usearch to use upstream version

ce8995d

remove saphirerapids build from CI

1606459

replace isahc with reqwest, as isahc creates thread-pool on import wh…

be94557

…ich makes postmaster error if loaded in shared_preload libs

add background worker for daemon in lantern_extras extension, fix tes…

3a196af

…ts for daemon

use GUC variables for openai and cohere tokens for default values on …

7aad65d

…deamon bgworker jobs

convert status server to actix server

da3c483

increase timeout for file download

fb6d7bd

update cli version

94e2d56

add data path to daemon to store models in postgres data dir when run…

664f1a7

… in bgworker

merge lantern_extras and lantern repos, modify CI scripts to work wit…

aabbefc

…h the new project structure

fix script path

ea9300e

fix bash script for ci

906bef1

fix name pattern for download-artifact

0b8c9af

fix test_extras workflow

e7cb17c

fix extras path for install

544d386

merge lantern and lantern_extras repos

07538c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge lantern_extras and lantern repos #338

merge lantern_extras and lantern repos #338

var77 commented Oct 7, 2024

merge lantern_extras and lantern repos #338

Are you sure you want to change the base?

merge lantern_extras and lantern repos #338

Conversation

var77 commented Oct 7, 2024