LightMem is a lightweight and efficient memory management framework designed for Large Language Models and AI Agents. It provides a simple yet powerful memory storage, retrieval, and update mechanism to help you quickly build intelligent applications with long-term memory capabilities.
- 
π Lightweight & Efficient 
 Minimalist design with minimal resource consumption and fast response times
- 
π― Easy to Use 
 Simple API design - integrate into your application with just a few lines of code
- 
π Flexible & Extensible 
 Modular architecture supporting custom storage engines and retrieval strategies
- 
π Broad Compatibility 
 Support for mainstream LLMs (OpenAI, Qwen, DeepSeek, etc.)
- [2025-10-12]: π LightMem project is officially Open-Sourced!
LightMem is continuously evolving! Here's what's coming:
- Offline Pre-computation of KV Cache for Update (Lossless)
- Online Pre-computation of KV Cache Before Q&A (Lossy)
- MCP (Memory Control Policy)
- Integration of Common Models and Feature Enhancement
- Coordinated Use of Context and Long-Term Memory Storage
- β¨ Key Features
- π’ News
- βοΈ Todo List
- π§ Installation
- β‘ Quick Start
- ποΈ Architecture
- π‘ Examples
- βοΈ Configuration
- π₯ Contributors
- π Related Projects
# Clone the repository
git clone https://github.com/zjunlp/LightMem.git
cd LightMem
# Create virtual environment
conda create -n lightmem python=3.10 -y
conda activate lightmem
# Install dependencies
unset ALL_PROXY
pip install -e .pip install lightmem  # Coming sooncd experiments
python run_lightmem_qwen.pyLightMem adopts a modular design, breaking down the memory management process into several pluggable components. The core directory structure exposed to users is outlined below, allowing for easy customization and extension:
LightMem/
βββ src/lightmem/            # Main package
β   βββ __init__.py          # Package initialization
β   βββ configs/             # Configuration files
β   βββ factory/             # Factory methods
β   βββ memory/              # Core memory management
β   βββ memory_toolkits/     # Memory toolkits
βββ experiments/             # Experiment scripts
βββ datasets/                # Datasets files
βββ examples/                # ExamplesThe following table lists the backends values currently recognized by each configuration module. Use the model_name field (or the corresponding config object) to select one of these backends.
| Module (config) | Supported backends | 
|---|---|
| PreCompressorConfig | llmlingua-2,entropy_compress | 
| TopicSegmenterConfig | llmlingua-2 | 
| MemoryManagerConfig | openai,deepseek | 
| TextEmbedderConfig | huggingface | 
| MMEmbedderConfig | huggingface | 
| EmbeddingRetrieverConfig | qdrant | 
import os
from lightmem.memory.lightmem import LightMemory
from lightmem.configs.base import BaseMemoryConfigs
LOGS_ROOT = "./logs"
RUN_TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
RUN_LOG_DIR = os.path.join(LOGS_ROOT, RUN_TIMESTAMP)
os.makedirs(RUN_LOG_DIR, exist_ok=True)
API_KEY='YOUR_API_KEY'
API_BASE_URL=''
LLM_MODEL=''
EMBEDDING_MODEL_PATH='/your/path/to/models/all-MiniLM-L6-v2'
LLMLINGUA_MODEL_PATH='/your/path/to/models/llmlingua-2-bert-base-multilingual-cased-meetingbank'
config_dict = {
    "pre_compress": True,
    "pre_compressor": {
        "model_name": "llmlingua-2",
        "configs": {
            "llmlingua_config": {
                "model_name": LLMLINGUA_MODEL_PATH,
                "device_map": "cuda",
                "use_llmlingua2": True,
            },
        }
    },
    "topic_segment": True,
    "precomp_topic_shared": True,
    "topic_segmenter": {
        "model_name": "llmlingua-2",
    },
    "messages_use": "user_only",
    "metadata_generate": True,
    "text_summary": True,
    "memory_manager": {
        "model_name": "openai",
        "configs": {
            "model": LLM_MODEL,
            "api_key": API_KEY,
            "max_tokens": 16000,
            "openai_base_url": API_BASE_URL
        }
    },
    "extract_threshold": 0.1,
    "index_strategy": "embedding",
    "text_embedder": {
        "model_name": "huggingface",
        "configs": {
            "model": EMBEDDING_MODEL_PATH,
            "embedding_dims": 384,
            "model_kwargs": {"device": "cuda"},
        },
    },
    "retrieve_strategy": "embedding",
    "embedding_retriever": {
        "model_name": "qdrant",
        "configs": {
            "collection_name": "my_long_term_chat",
            "embedding_model_dims": 384,
            "path": "./my_long_term_chat", 
        }
    },
    "update": "offline",
    "logging": {
        "level": "DEBUG",
        "file_enabled": True,
        "log_dir": RUN_LOG_DIR,
    }
}
lightmem = LightMemory.from_config(config_dict)### Add Memory
session = {
"timestamp": "2025-01-10",
"turns": [
    [
        {"role": "user", "content": "My favorite ice cream flavor is pistachio, and my dog's name is Rex."}, 
        {"role": "assistant", "content": "Got it. Pistachio is a great choice."}], 
    ]
}
for turn_messages in session["turns"]:
    timestamp = session["timestamp"]
    for msg in turn_messages:
        msg["time_stamp"] = timestamp
        
    store_result = lightmem.add_memory(
        messages=turn_messages,
        force_segment=True,
        force_extract=True
    )lightmem.construct_update_queue_all_entries()
lightmem.offline_update_all_entries(score_threshold=0.8)question = "What is the name of my dog?"
related_memories = lightmem.retrieve(question, limit=5)
print(related_memories)For transparency and reproducibility, we have shared the results of our experiments on Google Drive. This includes model outputs, evaluation logs, and predictions used in our study.
π Access the data here: Google Drive - Experimental Results
Please feel free to download, explore, and use these resources for research or reference purposes.
All behaviors of LightMem are controlled via the BaseMemoryConfigs configuration class. Users can customize aspects like pre-processing, memory extraction, retrieval strategy, and update mechanisms by providing a custom configuration.
| Option | Default | Usage (allowed values and behavior) | 
|---|---|---|
| pre_compress | False | True / False. If True, input messages are pre-compressed using the pre_compressorconfiguration before being stored. This reduces storage and indexing cost but may remove fine-grained details. If False, messages are stored without pre-compression. | 
| pre_compressor | None | dict / object. Configuration for the pre-compression component ( PreCompressorConfig) with fields likemodel_name(e.g.,llmlingua-2,entropy_compress) andconfigs(model-specific parameters). Effective only whenpre_compress=True. | 
| topic_segment | False | True / False. Enables topic-based segmentation of long conversations. When True, long conversations are split into topic segments and each segment can be indexed/stored independently (requires topic_segmenter). When False, messages are stored sequentially. | 
| precomp_topic_shared | False | True / False. If True, pre-compression and topic segmentation can share intermediate results to avoid redundant processing. May improve performance but requires careful configuration to avoid cross-topic leakage. | 
| topic_segmenter | None | dict / object. Configuration for topic segmentation ( TopicSegmenterConfig), includingmodel_nameandconfigs(segment length, overlap, etc.). Used whentopic_segment=True. | 
| messages_use | 'user_only' | 'user_only'/'assistant_only'/'hybrid'. Controls which messages are used to generate metadata and summaries:user_onlyuses user inputs,assistant_onlyuses assistant responses,hybriduses both. Choosinghybridincreases processing but yields richer context. | 
| metadata_generate | True | True / False. If True, metadata such as keywords and entities are extracted and stored to support attribute-based and filtered retrieval. If False, no metadata extraction occurs. | 
| text_summary | True | True / False. If True, a text summary is generated and stored alongside the original text (reduces retrieval cost and speeds review). If False, only the original text is stored. Summary quality depends on memory_manager. | 
| memory_manager | MemoryManagerConfig() | dict / object. Controls the model used to generate summaries and metadata ( MemoryManagerConfig), e.g.,model_name(openai,deepseek) andconfigs. Changing this affects summary style, length, and cost. | 
| extract_threshold | 0.5 | float (0.0 - 1.0). Threshold used to decide whether content is important enough to be extracted as metadata or highlight. Higher values (e.g., 0.8) mean more conservative extraction; lower values (e.g., 0.2) extract more items (may increase noise). | 
| index_strategy | None | 'embedding'/'context'/'hybrid'/None. Determines how memories are indexed: 'embedding' uses vector-based indexing (requires embedders/retriever) for semantic search; 'context' uses text-based/contextual retrieval (requires context_retriever) for keyword/document similarity; and 'hybrid' combines context filtering and vector reranking for robustness and higher accuracy. | 
| text_embedder | None | dict / object. Configuration for text embedding model ( TextEmbedderConfig) withmodel_name(e.g.,huggingface) andconfigs(batch size, device, embedding dim). Required whenindex_strategyorretrieve_strategyincludes'embedding'. | 
| multimodal_embedder | None | dict / object. Configuration for multimodal/image embedder ( MMEmbedderConfig). Used for non-text modalities. | 
| history_db_path | os.path.join(lightmem_dir, "history.db") | str. Path to persist conversation history and lightweight state. Useful to restore state across restarts. | 
| retrieve_strategy | 'embedding' | 'embedding'/'context'/'hybrid'. Strategy used at query time to fetch relevant memories. Pick based on data and query type: semantic queries ->'embedding'; keyword/structured queries ->'context'; mixed ->'hybrid'. | 
| context_retriever | None | dict / object. Configuration for context-based retriever ( ContextRetrieverConfig), e.g.,model_name='BM25'andconfigsliketop_k. Used whenretrieve_strategyincludes'context'. | 
| embedding_retriever | None | dict / object. Vector store configuration ( EmbeddingRetrieverConfig), e.g.,model_name='qdrant'and connection/index params. Used whenretrieve_strategyincludes'embedding'. | 
| update | 'offline' | 'online'/'offline'.'online': update memories immediately after each interaction (low latency for fresh memories but higher operational cost).'offline': batch or scheduled updates to save cost and aggregate changes. | 
| kv_cache | False | True / False. If True, attempt to precompute and persist model KV caches to accelerate repeated LLM calls (requires support from the LLM runtime and may increase storage). Uses kv_cache_pathto store cache. | 
| kv_cache_path | os.path.join(lightmem_dir, "kv_cache.db") | str. File path for KV cache storage when kv_cache=True. | 
| graph_mem | False | True / False. When True, some memories will be organized as a graph (nodes and relationships) to support complex relation queries and reasoning. Requires additional graph processing/storage. | 
| version | 'v1.1' | str. Configuration/API version. Only change if you know compatibility implications. | 
| logging | 'None' | dict / object. Configuration for logging enabled. | 
| JizhanFang | Xinle-Deng | Xubqpanda | HaomingX | James-TYQ | evy568 | Norah-Feathertail | 

