Skip to content

Conversation

soumilsuri
Copy link
Contributor

No description provided.

} catch (Exception ignore) {
// leave context unset for non-numeric db names
}
processDatabase(dbName, nowSeconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid processDatabase call if context id wasn't set(catch)


private void ensureCollectionExists(MongoDatabase db, String collectionName) {
boolean exists = false;
try (MongoCursor<String> it = db.listCollectionNames().cursor()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things -

  1. Check if there's a better way to do this other than iterating on all collections, if not then it's fine
  2. Do not create a new collection if it doesn't exist for a particular database. Skip any processing in this case

Object val = doc.get("archivalDays");
if (val instanceof Number) {
long days = ((Number) val).longValue();
if (days == 30L || days == 60L || days == 90L) return days;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, just take the return the value of days. Add a sanity check like -

if days < 30 || days > 90:
return default_val

also make these as constants

Comment on lines 98 to 100
if (!(retentionDays == 30L || retentionDays == 60L || retentionDays == 90L)) {
retentionDays = DEFAULT_RETENTION_DAYS;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need of this, handle in fetchretentiondays, view the comment

int totalMoved = 0;

while (true) {
List<Document> batch = new ArrayList<>(BATCH_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do an exercise here. Let me know, i can help you with it. Insert around 1 million documents, where around almost all of them are older. Measure how much time it takes to delete. Can help you with the data insertion script


int totalMoved = 0;

while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a log for how long it take for each iteration in while loop

Comment on lines 219 to 242
try (MongoCursor<Document> cursor = source
.find()
.sort(Sorts.ascending("detectedAt"))
.limit(batch)
.cursor()) {
while (cursor.hasNext()) {
oldestDocs.add(cursor.next());
}
}

if (oldestDocs.isEmpty()) break;

final List<Document> docsSnapshot = new ArrayList<>(oldestDocs);
this.scheduler.submit(() -> {
try {
dest.insertMany(docsSnapshot);
} catch (Exception e) {
logger.errorAndAddToDb("Async archive insert failed in db " + dbName + ": " + e.getMessage(), LoggerMaker.LogDb.RUNTIME);
}
});

Set<Object> ids = new HashSet<>();
for (Document d : oldestDocs) {
ids.add(d.get("_id"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all this repetitive code. Ideally we should be calling the same functions which were adding to temp collection and deleting the docs in malicious_events


{
int val = updatedConfig.getArchivalDays();
if (val == 30 || val == 60 || val == 90) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add < and > checks as commented earlier

}
{
int val = updatedConfig.getArchivalDays();
if (val == 30 || val == 60 || val == 90) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above

long deleted = deleteByIds(source, ids, dbName);

totalDeleted += deleted;
toRemove -= deleted;
Copy link
Contributor

@ayushaga14 ayushaga14 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove toRemove variable. Let's put a check where we break out of while loop if more than 1_00_000 docs have been deleted, or no documents are left to be deleted

// leave context unset for non-numeric db names
}
if (accId != null) {
processDatabase(dbName, nowSeconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename processDatabase to a more sutiable name, which indicates what kinda operation is happening inside the function. for ex - deleteDocumentsFromDatabase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants