LanceDB Snippet Manager is a customizable library for efficient storage, retrieval, and searching of code snippets using LanceDB, a vector database for AI applications. It leverages advanced embedding techniques for intelligent snippet management and retrieval, with flexibility to adapt to various use cases.
- Customizable embedding models and configurations
- Flexible storage and indexing of code snippets
- Generate embeddings for efficient similarity search
- Advanced search capabilities using vector similarity and hybrid search
- Support for multiple programming languages
- Reranking support for improved search results
- Easily extensible for specific use cases
To install the LanceDB Snippet Manager, run the following command:
pip install git+https://github.com/mkpro118/lancedb-snippets.git
This will install the latest version of the package directly from the GitHub repository.
snippets.lancedb.config.py
: Defines configuration classes (DBConfig
,ST_Config
) for the database and embedding modelsdb_connection.py
: Manages LanceDB connections (SnippetDBConnection
) and table operationsfactory.py
: Creates the schema (SnippetSchemaFactory
) for storing snippets in LanceDBgenerator.py
: Generates responses (SnippetGenerator
) based on snippet searchestable.py
: Handles operations on individual snippet tables (SnippetTable
)languages.py
: Defines supported programming languages (Language
enum) and language detectionsnippets.py
: Defines theSnippet
class and related operations
Here's a basic example of how to use the LanceDB Snippet Manager:
from snippets.config import ST_Config
from snippets.db_connection import SnippetDBConnection
from snippets.snippets import Snippet
from snippets.languages import Language
# Initialize the database connection
config = ST_Config()
db_connection = SnippetDBConnection.from_uri(config, ".snippet-db")
# Create or get a table
table = db_connection.get_or_create_table(config, "my_snippets")
# Create a snippet
snippet = Snippet(
text="print('Hello, World!')",
language=Language.PY,
filename="hello.py"
)
# Add the snippet to the table
table.add_snippets(snippet)
# Search for snippets
results = table.search("print hello", language="Python")
# Display results
for row in results.iterrows():
print(f"Language: {row['language']}")
print(f"Filename: {row['filename']}")
print(f"Snippet: {row['text']}")
print("---")
The library is designed to be highly customizable:
-
Embedding Models: You can use different embedding models by customizing the
DBConfig
class inconfig.py
. The default uses the BAAI/bge-small-en-v1.5 model, but you can easily switch to other models supported by the sentence-transformers library. -
Database Configuration: The
SnippetDBConnection
class indb_connection.py
allows you to customize the database connection, including the use of in-memory databases or persistent storage. -
Language Support: The
Language
enum inlanguages.py
can be extended to support additional programming languages as needed. -
Schema Customization: The
SnippetSchemaFactory
infactory.py
allows you to customize the schema for storing snippets, enabling you to add additional metadata fields if required. -
Search Customization: The
search
method intable.py
supports customizable search parameters and reranking options.
Here's a simple example of how to use the LanceDB Snippet Manager:
from snippets.config import ST_Config
from snippets.db_connection import SnippetDBConnection
from snippets.snippets import Snippet
from snippets.languages import Language
# Initialize with custom configuration
config = ST_Config(model="all-MiniLM-L6-v2") # Using a different embedding model
db_connection = SnippetDBConnection.from_uri(config, ".snippet-db")
# Create or get a table
table = db_connection.get_or_create_table(config, "my_snippets")
# Create and add a snippet
snippet = Snippet(
text="print('Hello, World!')",
language=Language.PY,
filename="hello.py"
)
table.add_snippets(snippet)
# Search for snippets with custom parameters
results = table.search("print hello", language="Python", limit=10)
for row in results.iterrows():
print(f"Language: {row['language']}")
print(f"Filename: {row['filename']}")
print(f"Snippet: {row['text']}")
print("---")
For more advanced usage and customization options, please refer to the individual module documentation:
config.py
: Customize embedding models and database configurationsdb_connection.py
: Customize database connections and table managementfactory.py
: Customize snippet schemagenerator.py
: Extend snippet response generationtable.py
: Customize search and indexing operationslanguages.py
: Add support for additional programming languagessnippets.py
: Extend the Snippet class with additional functionality
- LanceDB
- NumPy
- Pandas
- Sentence Transformers (this package depends on
torch
, which is a heavy dependency) - Tantivy
- Wikipedia-API
For a full list of dependencies, please refer to the requirements.txt
file.
This project uses:
mypy
for static type checkingblack
for code formattingisort
for import sorting
Configuration for these tools can be found in pyproject.toml
and mypy.ini
.
Contributions are welcome! Whether it's extending language support, adding new embedding models, or improving search algorithms, feel free to submit a Pull Request.
This project is licensed under the MIT License.