Skip to content

Add PostgreSQL support for long-term memory storage #2892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tibbon
Copy link

@tibbon tibbon commented May 23, 2025

This PR adds PostgreSQL integration for CrewAI's long-term memory system. I added this because I'm using CrewAI in AWS Lambda and can't easily share an on-disk SQLite3 database between runs for long-term memory persistence. With this, you can use RDS/PostgreSQL. You still need mem0 for Entity storage.

This is my first contribution here, but I've tried to ensure good test coverage, tried everything out locally, added documentation, and ensured that everything passed the CI tests and linting. I'm not much of a Python programmer, and I'm happy to get feedback if my code isn't properly idiomatic. I'll amend it as needed.

  • Create LTMPostgresStorage class supporting PostgreSQL 16+ database backend
  • Implement connection pooling with psycopg for improved performance
  • Add storage factory pattern via LTMStorageFactory for backend selection
  • Support environment variable configuration for simplified deployment
  • Add context manager pattern for improved resource management
  • Implement robust validation and security measures:
    • Input validation for all database parameters
    • Protection against SQL injection
    • Secure connection string handling
    • Comprehensive error types for better debugging
  • Add CI-compatible tests with mocks
  • Create detailed documentation with examples and best practices
  • Add optional dependency for psycopg[pool] in pyproject.toml for connection pooling

This PR adds comprehensive PostgreSQL integration for CrewAI's long-term memory system:

- Create LTMPostgresStorage class supporting PostgreSQL 16+ database backend
- Implement connection pooling with psycopg for improved performance
- Add storage factory pattern via LTMStorageFactory for backend selection
- Support environment variable configuration for simplified deployment
- Add context manager pattern for improved resource management
- Implement robust validation and security measures:
  - Input validation for all database parameters
  - Protection against SQL injection
  - Secure connection string handling
  - Comprehensive error types for better debugging
- Add CI-compatible tests with mocks
- Create detailed documentation with examples and best practices
- Add optional dependency for psycopg[pool] in pyproject.toml
@tibbon tibbon force-pushed the feature/postgres-long-term-memory branch from fc7fe6d to fb099b8 Compare May 23, 2025 02:14
@tibbon
Copy link
Author

tibbon commented May 23, 2025

@joaomdmoura can your review crew do this one too?

@tibbon
Copy link
Author

tibbon commented May 28, 2025

@gvieira Do you have any feedback on this one?

Comment on lines +56 to +58
memory=True, # Enable memory system
long_term_memory=long_term_memory, # Use PostgreSQL for long-term memory
entity_memory=EntityMemory() # Required for automatic memory saving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your use case, I'd recommend encouraging users to set only long_term_memory instead of configuring all the individual memory attributes.


### Automatic Memory Saving

**Requires both `long_term_memory` AND `entity_memory` configured**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to make entity_memory requierd?

)

# Save manually
crew._long_term_memory.save(memory_item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a private attribute, we advise against relying on it directly. We can think in another public access, but not a private one


```python
# Get the storage from your memory instance
postgres_storage = crew.memory.storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you inferring that memory is long-term but might be any other. If you want to get long-term's storage I'd recommend expose it in our API instead

result = crew.kickoff()
finally:
# Always clean up resources, even if an error occurs
memory.cleanup() # You can also use memory.close() for backward compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean reset, isn't?

Comment on lines +18 to +29
def __init__(
self,
storage=None,
storage_type: str = "sqlite",
path: Optional[str] = None,
postgres_connection_string: Optional[str] = None,
postgres_schema: Optional[str] = None,
postgres_table_name: Optional[str] = None,
postgres_min_pool_size: Optional[int] = None,
postgres_max_pool_size: Optional[int] = None,
postgres_use_connection_pool: Optional[bool] = None,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that storage_type can be used as a factory parameter. However, I think this class should remain storage-type agnostic. We could rely on **kwargs instead of mapping all supported attributes for each storage type here. IMO, the factory (as the PostgresStorageFactory) should be responsible for handling those mappings.. not this class

Comment on lines +71 to +78
# Ensure quality is in metadata (from item.quality if available)
if "quality" not in metadata and item.quality is not None:
metadata["quality"] = item.quality

# Check if quality is available
if "quality" not in metadata:
raise ValueError("Memory quality must be provided either in item.quality or item.metadata['quality']")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sending these info as metadata? Is it exclusive for LTStorage Postgress ?

self.storage.reset()

def cleanup(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood what this method was doing.. at first, I thought it was meant to reset created entries, but I realized it’s actually for closing any open connections.

That said, I do have some concerns about the current API design, it feels a bit confusing. Here are a few thoughts off the top of my head:

  • If users want to use long-term memory with Postgres, we should always encourage them to use context managers.
  • Let’s consider removing this cleanup method, as its name could be easily confused with something like reset. If we decide to keep it, we should rename it to something clearer—maybe just close

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to discuss the API usage if you want

Copy link
Contributor

@lucasgomide lucasgomide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tibbon I love your suggestion!

I just dropped some comments, I didn't finishs the review.. I'm looking forward for the next steps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants