Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions src/scripts/cleanUpEntityData.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
const MongoClient = require('mongodb').MongoClient
require('dotenv').config()

async function deleteEntitiesInBatches(mongoUrl) {
const batchSize = 1000 // Configurable batch size
const entityTypes = ['state', 'district', 'block', 'cluster', 'school']

// Validate MongoDB URL
if (!mongoUrl) {
console.error('Error: MongoDB URL must be provided as a command-line argument or in .env as MONGODB_URL')
process.exit(1)
}

let client

try {
// Connect to MongoDB
client = await MongoClient.connect(mongoUrl, { useNewUrlParser: true, useUnifiedTopology: true })
console.log('Connected to MongoDB')

const db = client.db() // Use default database from URL
const collection = db.collection('entities')

// Count total matching documents
const totalDocs = await collection.countDocuments({ entityType: { $in: entityTypes } })
console.log(`Total documents to delete: ${totalDocs}`)

if (totalDocs === 0) {
console.log('No documents found matching the criteria. Exiting.')
return
}

// Delete in batches
let deletedCount = 0
while (deletedCount < totalDocs) {
// Find a batch of document IDs to delete
const batchDocs = await collection
.find({ entityType: { $in: entityTypes } })
.limit(batchSize)
.project({ _id: 1 })
.toArray()

if (batchDocs.length === 0) {
break // No more documents to delete
}

// Extract IDs for deletion
const batchIds = batchDocs.map((doc) => doc._id)

// Delete documents by IDs
const batchResult = await collection.deleteMany({ _id: { $in: batchIds } })
const batchDeleted = batchResult.deletedCount
deletedCount += batchDeleted
console.log(`Deleted ${batchDeleted} documents in this batch. Total deleted: ${deletedCount}`)
}

console.log(`Deletion complete. Total documents deleted: ${deletedCount}`)
} catch (error) {
console.error('Error during deletion:', error.message)
process.exit(1)
} finally {
if (client) {
await client.close()
console.log('MongoDB connection closed')
}
}
}

// Get MongoDB URL from command-line argument or environment variable
const mongoUrl = process.argv[2] || process.env.MONGODB_URL

// Run the script
deleteEntitiesInBatches(mongoUrl).catch((error) => {
console.error('Script failed:', error.message)
process.exit(1)
})
33 changes: 33 additions & 0 deletions src/scripts/readme
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# MongoDB Entity Data Cleanup Script

This Node.js script deletes all documents from the MongoDB `entities` collection where `entityType` is one of `state`, `district`, `block`, `cluster`, or `school`. It processes deletions in batches for efficiency and supports MongoDB 4.x with the MongoDB Node.js driver v3.x. The script accepts a MongoDB URL as a command-line argument or falls back to an environment variable.

## Prerequisites

- **Node.js**: Version 14 or later.
- **MongoDB Server**: Version 4.x (e.g., 4.0 or 4.2).
- **NPM Packages**: `mongodb@3.6.12`, `dotenv`.
- **MongoDB URL**: A valid connection string (e.g., `mongodb://localhost:27017/elevate-entity`).

## Usage

The script can be executed **outside** or **inside** a Docker container. It deletes documents in batches of 1000 (configurable) and logs progress.

### Option 1: Run Outside Docker

1. Save the script as `cleanUpEntityData.js` (provided in the repository or separately).

2. Run the script with a MongoDB URL as a command-line argument:
```bash
node cleanUpEntityData.js mongodb://localhost:27017/prod-saas-elevate-entity
```

### Option 2: Run Inside Docker

1. Save the script as `cleanUpEntityData.js` inside entity service docker container

2. Run the container with the MongoDB URL as an argument:

```bash
node cleanUpEntityData.js
```