This project implements a semantic search system that retrieves quotes based on meaning rather than exact keywords. By embedding quotes and user queries into a high-dimensional vector space, the system finds the closest matches using FAISS similarity search.
Traditional keyword search fails when wording changes. For example, “Follow your dreams” and “Chase your goals” may not match directly. This project uses large language model (LLM) embeddings to capture semantic meaning and return relevant quotes, even if phrased differently.
- Embeddings → Quotes are encoded into 1024-dimensional vectors using GIST-large-Embedding-v0.
- FAISS Indexing → Vectors are stored in a FAISS index for fast nearest-neighbor search.
- Metadata Store → Quotes and authors are linked via a 'dbm + pickle' key–value store.
- Querying → User input is embedded and compared against the index; the top-3 most similar quotes are returned in real time.
You can build a new index and try it with the sample quotes file for quick testing, or download the prebuilt FAISS index and the Quote Metadata Database for full use with the link below (~500k quotes):
Once the Requirements are Installed, Run:
python3 make_index.py # May take a long time depending on GPU
Then Run:
python3 find_quote.py # For running tests
Loading model...
Loading FAISS index...
Using key-value store of type dbm.dumb
Paraphrase your quote: Coding is the future
148380: 'Computer Coding is a life skill for this generation.'
Tamara Zentic MS
291906: 'If you control the code, you control the world. This is the future that awaits us.'
Marc Goodman, Future Crimes: Everything Is Connected, Everyone Is Vulnerable, and What We Can Do About It
199487: 'Coding is other type of magic!'
Deyth Banger
Time taken: 0.283 seconds
This project requires a CUDA-enabled GPU. The code is set to run on device="cuda".
- Tested with CUDA 12.1 and
torch==2.5.1+cu121. - Make sure your system has a compatible GPU driver and CUDA runtime installed.
- If you want to run on CPU instead, change
device="cuda"todevice="cpu"in the code.