-
-
Notifications
You must be signed in to change notification settings - Fork 33
Setting Up Vector Search in Uli Community
Aatman Vaidya edited this page Sep 24, 2025
·
4 revisions
Notes on Python and Elixir (Phoenix) integration, scripts, and how to setup vector search end-to-end.
Important
Please make sure the PR that does the python setup https://github.com/tattle-made/Uli/pull/812 in the Dockerfile is merged before following the instructions below
-
Python location: Python code lives in
lib/python/with modules liketext_vec.py,video_vec.py, andclustering.py. A project-local virtualenv is created by the build usinguv- see more here. -
Interop library: We are using the
Exportlibrary (which wrapsErlPort) to start a Python interpreter and call Python functions from Elixir. -
Model Download: A GenServer
UliCommunity.MediaProcessing.TextVecRepVyakyarthdownloads the ML model and loads it into RAM -
Config: The Python executable and path are read from
config/*viaApplication.compile_env(:uli_community, [:python, :python_path]). The Dockerfile installs Python 3.10 and ships the virtualenv + HF cache dirs so this works inside containers.
- Exec into the running container (or pod) and open IEx remote shell:
bin/uli_community remote- Enqueue embedding jobs to extract vectors for unique unprocessed slurs: Run in IEx:
Scripts.ExtractCrowdsourcedSlurEmbedding.enqueue_unprocessed_texts_batch()- What it does: queries
crowdsourced_slursleft-joined withtext_vec_store_vyakyarthtables to find items without embeddings (deduped by lowercased trimmed label). It enqueues batches (size 128) to Oban queue:text_index.
- Cluster the stored embeddings: Run in IEx:
Scripts.ClusterTextVecStore.run()- What it does: cluster's all the slur's into unique cluster's. This is helpful to find what type of slurs are similar to each other.
Vector Search is now setup and from the UI you can start using it now.
-
Scripts.SeedCrowdsourcedSlurData210525.run()- Inserts seed data from
priv/crowdsourced-21-14-2025/slur_metadata.jsoninto domain tables. - Run:
Scripts.SeedCrowdsourcedSlurData210525.run()
- Inserts seed data from
-
Scripts.ExtractCrowdsourcedSlurEmbedding.enqueue_unprocessed_texts_batch()- Enqueues Oban jobs to compute embeddings for slurs missing entries in
text_vec_store_vyakyarth(deduplicated by normalized label). - Run:
Scripts.ExtractCrowdsourcedSlurEmbedding.enqueue_unprocessed_texts_batch()
- Enqueues Oban jobs to compute embeddings for slurs missing entries in
-
Scripts.ClusterTextVecStore.run()- Performs clustering over all stored embeddings by delegating to Python
clustering.get_clusters, then persists cluster IDs back totext_vec_store_vyakyarth. - Run:
Scripts.ClusterTextVecStore.run()
- Performs clustering over all stored embeddings by delegating to Python
- About Us
- Our Team
- Contributing to Uli
- Code of Conduct
- Internal Communications
- FAQs
- Curated Issues and Proposals for beginners
- Contributing Code
- Monitoring Issues and Triaging
- Helping review PRs
- Helping with QA
- Helping with Translations
- Sponsor Tattle
- 16 Days of Activism
- Mitigating Harms of Digitally Manipulated Images
- Setup Uli on Windows for Chrome
- Setup Uli on Windows for Chrominum Browsers (Brave, Kiwi etc)
- Setup Uli on Windows for Firefox
- Setup Uli on Windows for Firefox for Android
- Setup Uli on Linux for Chrome
- Setup Uli on Linux for Firefox
- Setup Uli on Linux for Firefox for Android
- Setup Uli on Linux for for Chromium Browsers(Kiwi, Brave etc)
