memgraph · katarinasupe · Feb 4, 2025 · Feb 4, 2025 · Feb 4, 2025
@@ -858,4 +858,22 @@ WHERE p.age >= 20 AND p.age <= 25
 RETURN p;
 ```
 
+## Efficient deletes
+
+Usually, the following query is used to delete all nodes and relationships in a
+graph:
+
+```cypher
+MATCH (n)
+DETACH DELETE n;
+```
+
+The above query matches nodes and then deletes relationships attached to them
+and that operation can consume a lot of memory in **larger graphs** (>1M). This
+is due to the accumulation of `Deltas`, which store changes to the graph
+objects. To avoid this and efficiently delete all graph objects, first delete
+all relationships and then all nodes [in
+batches](/querying/read-and-modify-data#batching-deletes). 
+
+
 <CommunityLinks/>
@@ -28,4 +28,5 @@ using the following clauses:
   * [`UNWIND`](/querying/clauses/unwind) - unwinds a list of values as individual rows
   * [`WHERE`](/querying/clauses/where) - filters the matched data
   * [`WITH`](/querying/clauses/with) - combines multiple reads and writes
-  * [`PERIODIC COMMIT`] - initial commit
+  * [`USING PERIODIC COMMIT num_rows`](/querying/read-and-modify-data#periodic-commit) - batches query for periodic execution
+  * [`CALL {subquery} IN TRANSACTIONS OF num_rows ROWS`](/querying/read-and-modify-data#call-subqueries-in-transactions) - batches subquery for periodic execution
@@ -109,42 +109,19 @@ DETACH DELETE p;
 
 ## 5. Deleting everything
 
-To delete all nodes and relationships in a graph, use the following query:
+To delete all nodes and relationships in a graph in a **smaller graph** (<1M), use
+the following query:
 
 ```cypher
 MATCH (n)
 DETACH DELETE n;
 ```
 
-Output:
-
-```nocopy
-Empty set (0.001 sec)
-```
-
-### How to lower memory consumption
-
-Matching nodes and then deleting relationships attached to them can consume a lot of memory in larger datasets (>1M). This is due to the accumulation of `Deltas`, which store changes to the graph objects. To avoid this and efficiently drop the database, first delete all relationships and then all nodes. To delete the relationships, execute the query below repeatedly until the number of deleted relationships is 100,000. 
-
-```cypher
-MATCH ()-[r]->()
-WITH r
-LIMIT 100000
-DELETE r
-RETURN count(r) AS num_deleted;
-```
-
-After deleting all relationships, run the following query repeatedly until the number of deleted nodes is 100,000 to delete all nodes:
-
-```cypher
-MATCH (n)
-WITH n
-LIMIT 100000
-DELETE n
-RETURN count(n) AS num_deleted;
-```
-
-If the deletion still consumes too much memory, consider lowering the batch size limit. 
+Matching nodes and then deleting relationships attached to them can consume a
+lot of memory in **larger graphs** (>1M). This is due to the accumulation of
+`Deltas`, which store changes to the graph objects. To avoid this and
+efficiently delete all graph objects, first delete all relationships and then all nodes
+[in batches](/querying/read-and-modify-data#batching-deletes). 
 
 ## Dataset queries
 

@@ -535,64 +535,74 @@ MATCH (n:WrongLabel) REMOVE n:WrongLabel, n.property;
 
 ## Periodic execution
 
-Write queries use additional memory overhead when modifying data, since the query needs 
-to be reverted to its original state if it fails. The memory overhead is linear to the number
-of updates performed during the queries. These updates during query execution are called Delta
-objects, which you can refer [to on this page](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default). The number of created Delta
-objects grows during the execution of write queries, and that takes additional memory space, which
-can be significant if the user is restricted with RAM memory. There is always a thread allocated in Memgraph
-for garbage collection (GC) that cleans temporary allocated space during query execution. Although Delta objects are temporary, and are being deleted by the GC at the transaction end, that is sometimes not 
-enough as there is already a huge amount of deltas present in the system during the end.
-In the following example:
+The data should be reverted to its original state if the query updating it
+fails. To ensure that, write queries need to use additional memory. The memory
+overhead is linear to the number of updates queries performed. These updates
+during query execution are called [`Delta`
+objects](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default). 
+
+The number of created `Delta` objects grows during the execution and it can't
+grow more than the available RAM. That's why `Delta` objects are temporary and
+deleted during garbage collection at the end of transaction. There is always a
+thread allocated in Memgraph for garbage collection (GC) that cleans temporary
+allocated space during query execution. Still, sometimes the `Delta` objects are
+accumulating faster than GC cleans them up.  
+
+Let's see that on an example. The following query deletes all data in the database:
 
 ```cypher
 MATCH (n) DETACH DELETE n;
 ```
 
-The amount of created Delta overhead is the total number of nodes and edges in the graph. Every
-delta is around 56 bytes in size. For a dataset of 40 million entities, it adds up to 2.25GB, which
-is significant. In practice, users experience bad behaviour when the number of allocated Delta objects
-is in the magnitude of millions, since they don't expect the total amount of memory to be rising, but either
-stay the same when updates are done, or go down if deletes are performed.
+The amount of created `Delta` overhead is the total number of nodes and
+relationships in the graph. Every `Delta` object is around 56 bytes in size. For
+a dataset of 40 million entities, it adds up to 2.24GB, which is significant. In
+practice, memory spike happens when the number of allocated `Delta` objects is
+in the magnitude of millions. It would be expected that the memory goes back to
+previous state when updates are done or that is goes down if deletes were
+performed. 
 
-Another example of an unusual spike in memory is during the creation of graph entities in the graph.
-Consider the following query:
+Another example of an unusual spike in memory is during the creation of graph
+objects. Consider the following query:
 
 ```cypher
 LOAD CSV FROM "/temp.csv" WITH HEADER AS row
 CREATE (:Node {id: row.id, name: row.name});
 ```
 
-This command will create multiple Delta for every CSV row ingested:
-- 1 delta object for creating a node in the graph
-- 1 delta object for adding a label `Node`
-- 2 delta objects for adding properties `id` and `name`
+This command will create multiple `Delta` objects for every CSV row ingested:
+- One `Delta` object for creating a node in the graph
+- One `Delta` object for adding a label `Node`
+- Two `Delta` objects for adding properties `id` and `name`
 
-The memory overhead could be even larger than the delete example above, since this query is generating four
-Delta objects per row.
+The memory overhead could be even larger than in the previous example, since
+this query is generating four `Delta` objects per row.
 
-For these purposes, periodic execution can be used to batch the query into smaller chunks, which enables
-the Delta objects to be also deallocated during query execution. The final benefit for the user is having
-a lot smaller amount of Delta objects allocated at a point in time (in the magnitude of thousands, and not
-in millions).
+To avoid memory overhead, you can use [periodic commit](#periodic-commit) or
+[CALL subqueries in transactions](#call-subqueries-in-transactions) to batch the
+query into smaller chunks, which enables the `Delta` objects to be deallocated
+during query execution. That means that there will be a lower number of `Delta`
+objects allocated at a point in time (in the magnitude of thousands, and not in
+millions).
 
-### USING PERIODIC COMMIT <num_rows>
+### Periodic commit
 
-`USING PERIODIC COMMIT num_rows` is a pre-query directive which one can use to specify how often
-will the query be batched. After each batch, the system will attempt to clear the Delta objects
-to release additional memory to the system. This will succeed if this query is the only one executing in
-the system at the time. The `num_rows` literal is a positive integer which indicates the batch size of processed rows 
-after which the Delta objects will be cleaned from the system. 
+To specify how often the query will be batched, use the `USING PERIODIC COMMIT
+num_rows` pre-query directive. After every batch, the system will attempt to
+clear the `Delta` objects to release additional memory to the system. This will
+succeed if this query is the only one executing in the system at the time. The
+`num_rows` literal is a positive integer which indicates the batch size of
+processed rows  after which the `Delta` objects will be cleaned from the system. 
 
-Consider again the following `LOAD CSV` example, but with a modification
+Consider again the previous `LOAD CSV` example, but with a periodic commit:
 
 ```cypher
 USING PERIODIC COMMIT 1000
 LOAD CSV FROM "/temp.csv" WITH HEADER AS row
 CREATE (:Node {id: row.id, name: row.name});
 ```
 
-The query execution plan for this query looks like this:
+Here is the query execution plan for the above query:
 
 ```
 +---------------------+
@@ -606,20 +616,24 @@ The query execution plan for this query looks like this:
 +---------------------+
 ```
 
-The query plan is read from bottom to top. Notice the `PeriodicCommit` operator at the end of the query used to count the number of rows produced through
-that operator. After the counter reaches the limit (in this case 1000), the query will be batched and committed, 
-and the Delta objects will potentially be released, if the query is the only one executing at the time.
+The [query plan](/querying/query-plan) is read from bottom to top. Notice the
+`PeriodicCommit` operator at the end of the query used to count the number of
+rows produced through that operator. After the counter reaches the limit (in
+this case 1000), the query will be batched and committed, and the `Delta` objects
+will potentially be released, if the query is the only one executing at the
+time.
 
-Since in the above paragraph we have said that we create four Delta objects per row in the query, and with a Delta
-objects consisting of about 56 bytes, we can expect around 224KB of additional memory overhead during query execution,
-rather than 2.24GB if there are 10 million entries in the CSV.
+As opposed to the 2.24GB of additional memory overhead during query execution,
+the modified query with the periodic commit in batches of 1000 will produce
+around 224KB of additional memory overhead.
 
+### CALL subqueries in transactions
 
-### CALL { subquery } IN TRANSACTIONS OF num_rows ROWS
+The `CALL { subquery } IN TRANSACTIONS OF num_rows ROWS` is an additional syntax
+for periodic execution. The `num_rows` literal is the positive integer which
+indicates the input branch batch size, after which the periodic execution will
+start releasing `Delta` objects.
 
-The `CALL { subquery } IN TRANSACTIONS OF num_rows ROWS` is an additional syntax for periodic execution. The
-`num_rows` literal is the positive integer which indicates the input branch batch size, after which the periodic execution
-will start releasing Delta objects.
 Consider this example:
 
 ```cypher
@@ -628,11 +642,11 @@ LOAD CSV FROM "/temp.csv" WITH HEADER AS row
 CALL {
   WITH row
   CREATE (:Node {id: row.id, name: row.name})
-} IN TRANSACTIONS OF 1000 rows;
+} IN TRANSACTIONS OF 1000 ROWS;
 ```
 
-The syntax is similar to the `USING PERIODIC COMMIT 1000`, but it offers a more fine-grained batching of the query.
-The query execution plan for this query is:
+The syntax is similar to the periodic commit, but it offers a more
+fine-grained batching of the query. The query execution plan for this query is:
 
 ```
 +-----------------------+
@@ -651,30 +665,61 @@ The query execution plan for this query is:
 +-----------------------+
 ```
 
-The `PeriodicSubquery` is the operator that is used in this case for batching the query and counting the number of rows. 
-Input branch of the `PeriodicSubquery` is the `LoadCsv`. After the number of rows have been consumed from the input branch,
-the query is batched and the Delta objects are potentially released.
+The `PeriodicSubquery` is the operator that is used in this case for batching
+the query and counting the number of rows. Input branch of the
+`PeriodicSubquery` is the `LoadCsv`. After the number of rows have been consumed
+from the input branch, the query is batched and the `Delta` objects are
+potentially released.
 
 ### Batching deletes
-At this point, batching of deletes only works with `USING PERIODIC COMMIT` syntax. If provided a `DELETE` operation
-with the `CALL {} IN TRANSACTIONS`, user will get an exception during the query planning. Users are advised to perform
-periodic delete in a similar manner to the below queries:
 
-For deleting nodes:
+The memory spike usually happens when users are trying to delete a large portion of their graph which consists of millions of nodes and relationships. 
+
+To resolve this, use periodic commit. Currently, batching deletes with CALL
+subqueries in transactions is not implemented and user will get an exception
+during the query planning. Users are advised to perform periodic delete in a
+similar manner to the below examples.
+
+Query to delete all nodes:
+```cypher
+USING PERIODIC COMMIT num_rows
+MATCH (n:Node)
+DETACH DELETE n;
+```
+
+Query to delete nodes labeled with `Node`:
 ```cypher
-USING PERIODIC COMMIT x
+USING PERIODIC COMMIT num_rows
 MATCH (n:Node)
-DETACH DELETE n
+DETACH DELETE n;
+```
+
+Query to delete relationships of type `:REL`:
+```cypher
+USING PERIODIC COMMIT num_rows 
+MATCH (n)-[r:REL]->(m)
+DELETE r;
 ```
 
-For deleting edges:
+Query to delete all relationships:
 ```cypher
-USING PERIODIC COMMIT x 
+USING PERIODIC COMMIT num_rows 
 MATCH (n)-[r]->(m)
-DELETE r
+DELETE r;
 ```
 
-### ACID guarantees for periodic execution.
-Since the Delta objects are necessary for reverting the query, it is possible to only guarantee the ACID compliance with respect to the batches committed
-inside the query. If the query experiences a failure during execution, the query is reverted up to the latest committed batch. Most common failure of queries
-is during [write-write conflicts](/help-center/errors/transactions#conflicting-transaction), and it is recommended that no other write operations are performed during periodic execution. 
+Choose the `num_rows` which is most suitable for the amount of RAM you have
+available, based on the calculations we provided before. If the deletion still
+consumes too much memory, consider lowering the batch size.
+
+
+### ACID guarantees for periodic execution
+
+Since the `Delta` objects are necessary for reverting the query, it is possible
+to only guarantee the ACID compliance with respect to the batches committed
+inside the query. If the query experiences a failure during execution, the query
+is reverted up to the latest committed batch. Most common failure of queries is
+during [write-write
+conflicts](/help-center/errors/transactions#conflicting-transaction), and it is
+recommended that no other write operations are performed during periodic
+execution.