Skip to content

feat: add batch delete documents support to API, SDK, CLI, and tests #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mdehsan873
Copy link

@mdehsan873 mdehsan873 commented May 11, 2025

  • Added /documents/batch_delete API endpoint for deleting multiple documents at once
  • Implemented service method to handle deletion and error resilience
  • Updated SDK (sync + async) to expose batch_delete_documents()
  • Added CLI command: morphik batch-delete <doc_ids...>
  • Added test cases in both async and sync SDKs to verify batch deletion
  • Closes Add batch delete documents endpoint #78

- Added /documents/batch_delete API endpoint for deleting multiple documents at once
- Implemented service method to handle deletion and error resilience
- Updated SDK (sync + async) to expose batch_delete_documents()
- Added CLI command: `morphik batch-delete <doc_ids...>`
- Added test cases in both async and sync SDKs to verify batch deletion
Copy link

jazzberry-ai bot commented May 11, 2025

Bug Report

Name Severity Example test case Description
Missing CLI command Low Attempt to use the morphik batch-delete command. The patch description mentions a CLI command for batch deletion, but there's no code related to the CLI in the provided diff.
No limit on batch size Medium Submit a request to delete a very large number of documents. There's no limit on the number of documents that can be deleted in a single batch. A malicious user could submit a request to delete a very large number of documents, potentially causing performance problems or even a denial-of-service.
Incorrect telemetry metadata High Use the batch_delete_documents API endpoint. The document_delete_metadata function in core/services/telemetry.py is not designed to handle batch deletion. It will likely raise a KeyError or only capture the first document ID in the list, resulting in inaccurate telemetry data.
Generic error message in API Low Call batch delete API with a bad document id The API endpoint returns a generic "Batch deletion failed" error message. It would be more helpful to provide more specific information about the error.

Comments? Email us.

@CLAassistant
Copy link

CLAassistant commented May 11, 2025

CLA assistant check
All committers have signed the CLA.

…t deletion

- Added limit check for batch size (MAX_BATCH_DELETE) to prevent abuse
- Integrated telemetry metadata tracking for batch_delete_documents API
- Enhanced error handling with specific logs and user-safe error messages
Copy link

jazzberry-ai bot commented May 11, 2025

Bug Report

Name Severity Example test case Description
Incomplete Error Reporting in Batch Document Deletion Medium Create two documents, attempt to batch delete one that will fail. The batch document deletion function does not explicitly inform the user about documents that failed to delete.

Comments? Email us.

@mdehsan873
Copy link
Author

mdehsan873 commented May 11, 2025

@ArnavAgrawal03 or @Adityav369 Can you please review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add batch delete documents endpoint
2 participants