Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions pages/querying/best-practices.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -858,4 +858,22 @@ WHERE p.age >= 20 AND p.age <= 25
RETURN p;
```

## Efficient deletes

Usually, the following query is used to delete all nodes and relationships in a
graph:

```cypher
MATCH (n)
DETACH DELETE n;
```

The above query matches nodes and then deletes relationships attached to them
and that operation can consume a lot of memory in **larger graphs** (>1M). This
is due to the accumulation of `Deltas`, which store changes to the graph
objects. To avoid this and efficiently delete all graph objects, first delete
all relationships and then all nodes [in
batches](/querying/read-and-modify-data#batching-deletes).


<CommunityLinks/>
3 changes: 2 additions & 1 deletion pages/querying/clauses.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ using the following clauses:
* [`UNWIND`](/querying/clauses/unwind) - unwinds a list of values as individual rows
* [`WHERE`](/querying/clauses/where) - filters the matched data
* [`WITH`](/querying/clauses/with) - combines multiple reads and writes
* [`PERIODIC COMMIT`] - initial commit
* [`USING PERIODIC COMMIT num_rows`](/querying/read-and-modify-data#periodic-commit) - batches query for periodic execution
* [`CALL {subquery} IN TRANSACTIONS OF num_rows ROWS`](/querying/read-and-modify-data#call-subqueries-in-transactions) - batches subquery for periodic execution
37 changes: 7 additions & 30 deletions pages/querying/clauses/delete.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,42 +109,19 @@ DETACH DELETE p;

## 5. Deleting everything

To delete all nodes and relationships in a graph, use the following query:
To delete all nodes and relationships in a graph in a **smaller graph** (<1M), use
the following query:

```cypher
MATCH (n)
DETACH DELETE n;
```

Output:

```nocopy
Empty set (0.001 sec)
```

### How to lower memory consumption

Matching nodes and then deleting relationships attached to them can consume a lot of memory in larger datasets (>1M). This is due to the accumulation of `Deltas`, which store changes to the graph objects. To avoid this and efficiently drop the database, first delete all relationships and then all nodes. To delete the relationships, execute the query below repeatedly until the number of deleted relationships is 100,000.

```cypher
MATCH ()-[r]->()
WITH r
LIMIT 100000
DELETE r
RETURN count(r) AS num_deleted;
```

After deleting all relationships, run the following query repeatedly until the number of deleted nodes is 100,000 to delete all nodes:

```cypher
MATCH (n)
WITH n
LIMIT 100000
DELETE n
RETURN count(n) AS num_deleted;
```

If the deletion still consumes too much memory, consider lowering the batch size limit.
Matching nodes and then deleting relationships attached to them can consume a
lot of memory in **larger graphs** (>1M). This is due to the accumulation of
`Deltas`, which store changes to the graph objects. To avoid this and
efficiently delete all graph objects, first delete all relationships and then all nodes
[in batches](/querying/read-and-modify-data#batching-deletes).

## Dataset queries

Expand Down
171 changes: 108 additions & 63 deletions pages/querying/read-and-modify-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -535,64 +535,74 @@ MATCH (n:WrongLabel) REMOVE n:WrongLabel, n.property;

## Periodic execution

Write queries use additional memory overhead when modifying data, since the query needs
to be reverted to its original state if it fails. The memory overhead is linear to the number
of updates performed during the queries. These updates during query execution are called Delta
objects, which you can refer [to on this page](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default). The number of created Delta
objects grows during the execution of write queries, and that takes additional memory space, which
can be significant if the user is restricted with RAM memory. There is always a thread allocated in Memgraph
for garbage collection (GC) that cleans temporary allocated space during query execution. Although Delta objects are temporary, and are being deleted by the GC at the transaction end, that is sometimes not
enough as there is already a huge amount of deltas present in the system during the end.
In the following example:
The data should be reverted to its original state if the query updating it
fails. To ensure that, write queries need to use additional memory. The memory
overhead is linear to the number of updates queries performed. These updates
during query execution are called [`Delta`
objects](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default).

The number of created `Delta` objects grows during the execution and it can't
grow more than the available RAM. That's why `Delta` objects are temporary and
deleted during garbage collection at the end of transaction. There is always a
thread allocated in Memgraph for garbage collection (GC) that cleans temporary
allocated space during query execution. Still, sometimes the `Delta` objects are
accumulating faster than GC cleans them up.

Let's see that on an example. The following query deletes all data in the database:

```cypher
MATCH (n) DETACH DELETE n;
```

The amount of created Delta overhead is the total number of nodes and edges in the graph. Every
delta is around 56 bytes in size. For a dataset of 40 million entities, it adds up to 2.25GB, which
is significant. In practice, users experience bad behaviour when the number of allocated Delta objects
is in the magnitude of millions, since they don't expect the total amount of memory to be rising, but either
stay the same when updates are done, or go down if deletes are performed.
The amount of created `Delta` overhead is the total number of nodes and
relationships in the graph. Every `Delta` object is around 56 bytes in size. For
a dataset of 40 million entities, it adds up to 2.24GB, which is significant. In
practice, memory spike happens when the number of allocated `Delta` objects is
in the magnitude of millions. It would be expected that the memory goes back to
previous state when updates are done or that is goes down if deletes were
performed.

Another example of an unusual spike in memory is during the creation of graph entities in the graph.
Consider the following query:
Another example of an unusual spike in memory is during the creation of graph
objects. Consider the following query:

```cypher
LOAD CSV FROM "/temp.csv" WITH HEADER AS row
CREATE (:Node {id: row.id, name: row.name});
```

This command will create multiple Delta for every CSV row ingested:
- 1 delta object for creating a node in the graph
- 1 delta object for adding a label `Node`
- 2 delta objects for adding properties `id` and `name`
This command will create multiple `Delta` objects for every CSV row ingested:
- One `Delta` object for creating a node in the graph
- One `Delta` object for adding a label `Node`
- Two `Delta` objects for adding properties `id` and `name`

The memory overhead could be even larger than the delete example above, since this query is generating four
Delta objects per row.
The memory overhead could be even larger than in the previous example, since
this query is generating four `Delta` objects per row.

For these purposes, periodic execution can be used to batch the query into smaller chunks, which enables
the Delta objects to be also deallocated during query execution. The final benefit for the user is having
a lot smaller amount of Delta objects allocated at a point in time (in the magnitude of thousands, and not
in millions).
To avoid memory overhead, you can use [periodic commit](#periodic-commit) or
[CALL subqueries in transactions](#call-subqueries-in-transactions) to batch the
query into smaller chunks, which enables the `Delta` objects to be deallocated
during query execution. That means that there will be a lower number of `Delta`
objects allocated at a point in time (in the magnitude of thousands, and not in
millions).

### USING PERIODIC COMMIT <num_rows>
### Periodic commit

`USING PERIODIC COMMIT num_rows` is a pre-query directive which one can use to specify how often
will the query be batched. After each batch, the system will attempt to clear the Delta objects
to release additional memory to the system. This will succeed if this query is the only one executing in
the system at the time. The `num_rows` literal is a positive integer which indicates the batch size of processed rows
after which the Delta objects will be cleaned from the system.
To specify how often the query will be batched, use the `USING PERIODIC COMMIT
num_rows` pre-query directive. After every batch, the system will attempt to
clear the `Delta` objects to release additional memory to the system. This will
succeed if this query is the only one executing in the system at the time. The
`num_rows` literal is a positive integer which indicates the batch size of
processed rows after which the `Delta` objects will be cleaned from the system.

Consider again the following `LOAD CSV` example, but with a modification
Consider again the previous `LOAD CSV` example, but with a periodic commit:

```cypher
USING PERIODIC COMMIT 1000
LOAD CSV FROM "/temp.csv" WITH HEADER AS row
CREATE (:Node {id: row.id, name: row.name});
```

The query execution plan for this query looks like this:
Here is the query execution plan for the above query:

```
+---------------------+
Expand All @@ -606,20 +616,24 @@ The query execution plan for this query looks like this:
+---------------------+
```

The query plan is read from bottom to top. Notice the `PeriodicCommit` operator at the end of the query used to count the number of rows produced through
that operator. After the counter reaches the limit (in this case 1000), the query will be batched and committed,
and the Delta objects will potentially be released, if the query is the only one executing at the time.
The [query plan](/querying/query-plan) is read from bottom to top. Notice the
`PeriodicCommit` operator at the end of the query used to count the number of
rows produced through that operator. After the counter reaches the limit (in
this case 1000), the query will be batched and committed, and the `Delta` objects
will potentially be released, if the query is the only one executing at the
time.

Since in the above paragraph we have said that we create four Delta objects per row in the query, and with a Delta
objects consisting of about 56 bytes, we can expect around 224KB of additional memory overhead during query execution,
rather than 2.24GB if there are 10 million entries in the CSV.
As opposed to the 2.24GB of additional memory overhead during query execution,
the modified query with the periodic commit in batches of 1000 will produce
around 224KB of additional memory overhead.

### CALL subqueries in transactions

### CALL { subquery } IN TRANSACTIONS OF num_rows ROWS
The `CALL { subquery } IN TRANSACTIONS OF num_rows ROWS` is an additional syntax
for periodic execution. The `num_rows` literal is the positive integer which
indicates the input branch batch size, after which the periodic execution will
start releasing `Delta` objects.

The `CALL { subquery } IN TRANSACTIONS OF num_rows ROWS` is an additional syntax for periodic execution. The
`num_rows` literal is the positive integer which indicates the input branch batch size, after which the periodic execution
will start releasing Delta objects.
Consider this example:

```cypher
Expand All @@ -628,11 +642,11 @@ LOAD CSV FROM "/temp.csv" WITH HEADER AS row
CALL {
WITH row
CREATE (:Node {id: row.id, name: row.name})
} IN TRANSACTIONS OF 1000 rows;
} IN TRANSACTIONS OF 1000 ROWS;
```

The syntax is similar to the `USING PERIODIC COMMIT 1000`, but it offers a more fine-grained batching of the query.
The query execution plan for this query is:
The syntax is similar to the periodic commit, but it offers a more
fine-grained batching of the query. The query execution plan for this query is:

```
+-----------------------+
Expand All @@ -651,30 +665,61 @@ The query execution plan for this query is:
+-----------------------+
```

The `PeriodicSubquery` is the operator that is used in this case for batching the query and counting the number of rows.
Input branch of the `PeriodicSubquery` is the `LoadCsv`. After the number of rows have been consumed from the input branch,
the query is batched and the Delta objects are potentially released.
The `PeriodicSubquery` is the operator that is used in this case for batching
the query and counting the number of rows. Input branch of the
`PeriodicSubquery` is the `LoadCsv`. After the number of rows have been consumed
from the input branch, the query is batched and the `Delta` objects are
potentially released.

### Batching deletes
At this point, batching of deletes only works with `USING PERIODIC COMMIT` syntax. If provided a `DELETE` operation
with the `CALL {} IN TRANSACTIONS`, user will get an exception during the query planning. Users are advised to perform
periodic delete in a similar manner to the below queries:

For deleting nodes:
The memory spike usually happens when users are trying to delete a large portion of their graph which consists of millions of nodes and relationships.

To resolve this, use periodic commit. Currently, batching deletes with CALL
subqueries in transactions is not implemented and user will get an exception
during the query planning. Users are advised to perform periodic delete in a
similar manner to the below examples.

Query to delete all nodes:
```cypher
USING PERIODIC COMMIT num_rows
MATCH (n:Node)
DETACH DELETE n;
```

Query to delete nodes labeled with `Node`:
```cypher
USING PERIODIC COMMIT x
USING PERIODIC COMMIT num_rows
MATCH (n:Node)
DETACH DELETE n
DETACH DELETE n;
```

Query to delete relationships of type `:REL`:
```cypher
USING PERIODIC COMMIT num_rows
MATCH (n)-[r:REL]->(m)
DELETE r;
```

For deleting edges:
Query to delete all relationships:
```cypher
USING PERIODIC COMMIT x
USING PERIODIC COMMIT num_rows
MATCH (n)-[r]->(m)
DELETE r
DELETE r;
```

### ACID guarantees for periodic execution.
Since the Delta objects are necessary for reverting the query, it is possible to only guarantee the ACID compliance with respect to the batches committed
inside the query. If the query experiences a failure during execution, the query is reverted up to the latest committed batch. Most common failure of queries
is during [write-write conflicts](/help-center/errors/transactions#conflicting-transaction), and it is recommended that no other write operations are performed during periodic execution.
Choose the `num_rows` which is most suitable for the amount of RAM you have
available, based on the calculations we provided before. If the deletion still
consumes too much memory, consider lowering the batch size.


### ACID guarantees for periodic execution

Since the `Delta` objects are necessary for reverting the query, it is possible
to only guarantee the ACID compliance with respect to the batches committed
inside the query. If the query experiences a failure during execution, the query
is reverted up to the latest committed batch. Most common failure of queries is
during [write-write
conflicts](/help-center/errors/transactions#conflicting-transaction), and it is
recommended that no other write operations are performed during periodic
execution.