kuzudb · royi-luo · Jul 11, 2025 · Jul 11, 2025 · Jul 11, 2025 · Jul 11, 2025
diff --git a/src/content/docs/cypher/query-clauses/call.md b/src/content/docs/cypher/query-clauses/call.md
@@ -27,7 +27,7 @@ The following table lists the built-in schema functions you can use with the `CA
 | `SHOW_LOADED_EXTENSIONS` | returns all loaded extensions |
 | `SHOW_INDEXES` | returns all indexes built in the system |
 | `SHOW_PROJECTED_GRAPHS` | returns all existing projected graphs in the system |
-| `PROJECTED_GRAPH_INFO` | returns the given projected graph information | 
+| `PROJECTED_GRAPH_INFO` | returns the given projected graph information |
 
 </div>
 
@@ -179,11 +179,11 @@ CALL show_attached_databases() RETURN *;
 
 `SHOW_WARNINGS` returns the contents of the
 [Warnings Table](/import#warnings-table-inspecting-skipped-rows). This is a feature
-related to [ignoring errors](/import#ignore-erroneous-rows) when running `COPY/LOAD FROM` statements to scan files. 
-They will only be reported if the [`IGNORE_ERRORS`](/import#ignore-erroneous-rows) setting is enabled. 
+related to [ignoring errors](/import#ignore-erroneous-rows) when running `COPY/LOAD FROM` statements to scan files.
+They will only be reported if the [`IGNORE_ERRORS`](/import#ignore-erroneous-rows) setting is enabled.
 Note that the number of warnings that are stored is limited by the `warning_limit` parameter.
 See [configuration](/cypher/configuration#configure-warning-limit) for more details on how to set the warning limit.
-After `warning_limit` many warnings are stored, any new warnings generated will not be stored. 
+After `warning_limit` many warnings are stored, any new warnings generated will not be stored.
 
 | Column | Description | Type |
 | ------ | ----------- | ---- |
@@ -207,7 +207,7 @@ CALL show_warnings() RETURN *;
 ```
 
 ### CLEAR_WARNINGS
-If you would like to clear the contents of the [Warnings Table](/import#warnings-table-inspecting-skipped-rows), you can run the `CLEAR_WARNINGS` function. 
+If you would like to clear the contents of the [Warnings Table](/import#warnings-table-inspecting-skipped-rows), you can run the `CLEAR_WARNINGS` function.
 This function has no output.
 
 ```cypher
@@ -315,9 +315,9 @@ CALL SHOW_PROJECTED_GRAPHS() RETURN *;
 ```
 
 ### PROJECTED_GRAPH_INFO
-To show the detail information of the projected graph, you can utilize the `PROJECTED_GRAPH_INFO` function.
+To show detailed information of the projected graph, you can utilize the `PROJECTED_GRAPH_INFO` function.
 
-There are two types of the projected graph:
+There are two types of projected graphs:
 
 ### Native projected graph
 | Column | Description | Type |
@@ -363,7 +363,7 @@ call PROJECTED_GRAPH_INFO('student') RETURN *;
 
 ## YIELD
 
-The `YIELD` clause in Kuzu is used to rename the return columns of a `CALL` function to avoid naming conflicts and improve readability.
+The `YIELD` clause in Kuzu allows renaming the return columns of a `CALL` function to prevent naming conflicts and improve readability.
 Usage:
 ```cypher
 CALL FUNC()
@@ -391,7 +391,7 @@ Another useful scenario is to avoid naming conflicts when two call functions in
 CALL table_info('person')
 YIELD `property id` as person_id,  name as person_name, type as person_type, `default expression` as person_default, `primary key` as person_pk
 CALL table_info('student')
-YIELD `property id` as student_id,  name as student_name, type as student_type, `default expression` as student_default, `primary key` as student_pk 
+YIELD `property id` as student_id,  name as student_name, type as student_type, `default expression` as student_default, `primary key` as student_pk
 RETURN *;
 ```
 ```

diff --git a/src/content/docs/cypher/query-clauses/load-from.md b/src/content/docs/cypher/query-clauses/load-from.md
@@ -25,7 +25,7 @@ in the `LOAD FROM` statement using the [`WITH HEADERS`](#bound-variable-names-an
 ### File format detection
 `LOAD FROM` determines the file format based on the file extension if the `file_format` option is not given. For instance, files with a `.csv` extension are automatically recognized as CSV format.
 
-If the file format cannot be inferred from the extension, or if you need to override the default sniffing behavior, the `file_format` option can be used.
+If the file format cannot be inferred from the extension or if you need to override the default sniffing behavior, the `file_format` option can be used.
 
 For example, to load a CSV file that has a `.tsv` extension (for tab-separated data), you must explicitly specify the file format using the `file_format` option, as shown below:
 ```cypher
@@ -48,18 +48,18 @@ The configurations documented in those pages can also be specified after the `LO
 CSV files. For example, you can indicate that the first line should
 be interpreted as a header line by setting `(headers = true)` or that the CSV delimiter is '|' by setting `(DELIM="|")`.
 Some of these configurations are also by default [automatically detected](/import/csv#csv-configurations) by Kuzu when scanning CSV files.
-These configurations determine the names and data types of the 
+These configurations determine the names and data types of the
 variables that bind to the fields scanned from CSV files.
-This page does not document those options in detail. We refer you to [CSV Configurations](/import/csv#csv-configurations) and 
+This page does not document those options in detail. We refer you to [CSV Configurations](/import/csv#csv-configurations) and
 [ignore erroneous rows](/import/csv#ignore-erroneous-rows) documentation pages for details.
 :::
 
 The syntax for using `LOAD FROM` to scan a CSV file is similar to the one used for using `COPY FROM` with CSV files.
 #### CSV header
 If (i) the CSV file has a header line, i.e., a first line that should not be interpreted
-as a tuple to be scanned; and (ii) `(header = true)` set, then the column names in the first line 
-provide the names of the columns. The data types are always automatically inferred from the CSV file (except of course 
-if `LOAD WITH HEADERS (...) FROM` is used, in which case the data types provided inside the `(...)` are used as 
+as a tuple to be scanned; and (ii) `(header = true)` set, then the column names in the first line
+provide the names of the columns. The data types are always automatically inferred from the CSV file (except of course
+if `LOAD WITH HEADERS (...) FROM` is used, in which case the data types provided inside the `(...)` are used as
 described [above](#bound-variable-names-and-data-types)).
 
 Suppose `user.csv` is a CSV file with the following contents:
@@ -110,7 +110,7 @@ LOAD FROM "user.csv" (header = false) RETURN *;
 
 Since Parquet files contain schema information in their metadata, Kuzu will always use the available
 schema information when loading from Parquet files (except again
-if `LOAD WITH HEADERS (...) FROM` is used). Suppose we have a Parquet file `user.parquet` with two columns `f0` and `f1` 
+if `LOAD WITH HEADERS (...) FROM` is used). Suppose we have a Parquet file `user.parquet` with two columns `f0` and `f1`
 and the same content as in the `user.csv` file above. Then the query below will scan the Parquet file and output the following:
 
 ```cypher
@@ -127,7 +127,7 @@ LOAD FROM "user.parquet" RETURN *;
 
 ### Pandas
 
-Kuzu allows zero-copy access to Pandas DataFrames. The variable names and data types of scanned columns 
+Kuzu allows zero-copy access to Pandas DataFrames. The variable names and data types of scanned columns
 within a Pandas DataFrame will be
 inferred from the schema information of the data frame. Here is an example:
 
@@ -136,7 +136,7 @@ inferred from the schema information of the data frame. Here is an example:
 import kuzu
 import pandas as pd
 
-db = kuzu.Database("persons")
+db = kuzu.Database("example.kuzu")
 conn = kuzu.Connection(db)
 
 df = pd.DataFrame({
@@ -162,14 +162,14 @@ Pandas can use either a NumPy or Arrow backend - Kuzu can natively scan from eit
 
 ### Polars
 
-Kuzu can also scan Polars DataFrames via the underlying PyArrow layer. The rules for determining the 
+Kuzu can also scan Polars DataFrames via the underlying PyArrow layer. The rules for determining the
 variable names and data types is identical to scanning Pandas data frames. Here is an example:
 
 ```python
 import kuzu
 import polars as pl
 
-db = kuzu.Database("tmp")
+db = kuzu.Database("example.kuzu")
 conn = kuzu.Connection(db)
 
 df = pl.DataFrame({
@@ -202,7 +202,7 @@ You can scan an existing PyArrow table as follows:
 import kuzu
 import pyarrow as pa
 
-db = kuzu.Database("tmp")
+db = kuzu.Database("example.kuzu")
 conn = kuzu.Connection(db)
 
 pa_table = pa.table({
@@ -349,7 +349,7 @@ RETURN *;
 
 ### Create nodes from input file
 
-You can pass the contents of `LOAD FROM` to a 
+You can pass the contents of `LOAD FROM` to a `CREATE` statement.
 
 ```cypher
 // Create a node table
@@ -431,7 +431,7 @@ casting operation fails.
 :::
 
 ## Ignore erroneous rows
-Errors can happen when scanning different lines or elements of an input file with `LOAD FROM`. 
+Errors can happen when scanning different lines or elements of an input file with `LOAD FROM`.
 Error can happen for several reasons, such as a line in the scanned file is malformed (e.g., in CSV files)
 or a field in the scanned line cannot be cast into its expected data type (e.g., due to an integer overflow).
 You can  skip erroneous lines when scanning large files by setting [`IGNORE_ERRORS`](/import#ignore-erroneous-rows)
@@ -446,7 +446,7 @@ Bob,2147483650
 
 Suppose we write a `LOAD FROM` statement that tries to read the second column as an INT32.
 The second row `(Bob,2147483650)` would be malformed because 2147483650 does not fit into an INT32 and will cause an error.
-By setting `IGNORE_ERRORS` to true, instead of erroring, we can make `LOAD FROM` simply skip over this line: 
+By setting `IGNORE_ERRORS` to true, instead of erroring, we can make `LOAD FROM` simply skip over this line:
 ```cypher
 LOAD WITH HEADERS (name STRING, age INT32) FROM "user.csv" (ignore_errors = true)
 RETURN name, age;
@@ -459,5 +459,5 @@ RETURN name, age;
 │ Alice  │ 4     │
 └────────┴───────┘
 ```
-You can also see the details of any warnings generated by the skipped lines using the [SHOW_WARNINGS](/cypher/query-clauses/call#show_warnings) function. 
+You can also see the details of any warnings generated by the skipped lines using the [SHOW_WARNINGS](/cypher/query-clauses/call#show_warnings) function.
 See the "Ignore erroneous rows" [`section`](/import#ignore-erroneous-rows) of `COPY FROM` for more details.
diff --git a/src/content/docs/cypher/transaction.md b/src/content/docs/cypher/transaction.md
@@ -5,7 +5,7 @@ title: Transactions
 Kuzu implements a transaction management subsystem that is atomic, durable and supports serializability.
 Satisfying these properties makes Kuzu ACID-compliant, as per database terminology.
 
-Every query, data manipulation command, DDL (i.e., new node/rel table schema definitions), or `COPY FROM` command to Kuzu is part of a transaction. Therefore, they exhibit all-or-nothing behavior, and after these commands or a set of them execute and are committed successfully, you are guaranteed that all changes will persist in their entirety. If the commands do not execute successfully or are rolled back, you are guaranteed that none of the changes will persist.
+Every query, data manipulation command, DDL (i.e., new node/rel table schema definitions), or `COPY FROM` command to Kuzu is part of a transaction. Therefore, they exhibit all-or-nothing behavior. After one or more of these commands are executed and committed successfully, you are guaranteed that all changes will persist in their entirety. If the commands do not execute successfully or are rolled back, you are guaranteed that none of the changes will persist.
 
 These conditions hold even if your system crashes at any point during a transaction. That is, once you commit a transaction successfully, all your changes will persist even if there is an error *after* committing. Similarly, if your system crashes before committing or rolling back, then none of your updates will persist.
 

diff --git a/src/content/docs/developer-guide/database-internal/datatype.md b/src/content/docs/developer-guide/database-internal/datatype.md
@@ -10,13 +10,13 @@ Logical type refers to how data is conceptually organized. This type doesn't han
 actually stored. Different logical types may have the same physical type during storage/query
 processing, e.g. both `INT32` and `DATE` logical type will have `INT32` physical type.
 
-During query compilation, i.e., parsing, binding, and planning, the logical type should always be used.
+During query compilation, which includes parsing, binding, and planning, the logical type should always be used.
 
 ## PhysicalType
 
 Physical type refers to the specific data format as it is physically stored on disk and in memory.
 
-Physical type is useful in storage and query processing, e.g., columns are constructed based on
+Physical type is useful in storage and query processing. For example, columns are constructed based on
 physical types. Comparison operators can only work on physical types. Using the physical type is not
 mandatory -- if you need to make a distinction between logical types, then you will
 need to fall back to using a logical type.
diff --git a/src/content/docs/developer-guide/database-internal/execution.md b/src/content/docs/developer-guide/database-internal/execution.md
@@ -8,7 +8,7 @@ We decompose `PhysicalPlan` into `Pipeline`s. A pipeline is a linear sequence of
 
 ### Pipeline decomposition
 
-Given a physical plan, we decompose into pipelines when we encounter a sink operator. A sink operator is an operator that must exhaust its input in order to process correctly, e.g. `HASH_JOIN_BUILD`, `AGGREGATE`, `ORDER BY`, etc. Pipelines have dependencies, meaning that one pipeline may depend on the output of another pipeline. E.g., `HASH_JOIN_PROBE` pipeline must depend on a `HASH_JOIN_BUILD` pipeline.
+Given a physical plan, we decompose into pipelines when we encounter a sink operator. A sink operator is an operator that must exhaust its input in order to process correctly, e.g. `HASH_JOIN_BUILD`, `AGGREGATE`, `ORDER BY`, etc. Pipelines have dependencies, meaning that one pipeline may depend on the output of another pipeline. For example, `HASH_JOIN_PROBE` pipeline must depend on a `HASH_JOIN_BUILD` pipeline.
 
 ### Morsel-driven parallelism
 

diff --git a/src/content/docs/developer-guide/database-internal/index.md b/src/content/docs/developer-guide/database-internal/index.md
@@ -12,7 +12,7 @@ The parser transforms an input string statement into Kuzu's internal AST, called
 
 ### Binder
 
-The binder binds a `Statement` into another AST called `BoundStatement`. It also validates the semantic correctness of the input statement and binds the string representation into an internal representation (mostly integer). For example, a table name will be bound to an internal table ID.
+The binder binds a `Statement` into another AST called `BoundStatement`. It also validates the semantic correctness of the input statement and binds the string representation into an internal representation (typically an integer). For example, a table name will be bound to an internal table ID.
 
 In addition, the binder also performs a semantic rewrite of `BoundStatement`. Semantic rewrites don't change the semantics of a `BoundStatement` but will rewrite it in a way for performant evaluation. Example: `MATCH (a) WITH a RETURN a.name` will be rewritten as `MATCH (a) RETURN a.name`
 
@@ -76,7 +76,7 @@ LogicalPlan
     |            Planner
 BoundStatement
     |            Binder
-Statement  
+Statement
     |            Parser
 String Input
 ```
diff --git a/src/content/docs/developer-guide/database-internal/vector.md b/src/content/docs/developer-guide/database-internal/vector.md
@@ -7,7 +7,7 @@ title: Vector types
 A value vector is Kuzu's column-oriented in-memory data structure to store a chunk of data of the same data type. The size of `ValueVector` is defined by `DEFAULT_VECTOR_CAPACITY`, which is 2048. This is an empirically selected value with the presumption that the data stored in the value vector will fit into CPU cache.
 
 A value vector has the following core fields:
-- `data`: Stores the actual data which is a trivial byte array managed by a unique pointer. 
+- `data`: Stores the actual data which is a trivial byte array managed by a unique pointer.
 - `nullMask`: Aligned with `data` and indicates if each entry is `NULL` or not.
 - `auxiliaryBuffer`: Keeps track of additional data that does not fit in `data`.
 

diff --git a/src/content/docs/developer-guide/files.mdx b/src/content/docs/developer-guide/files.mdx
@@ -5,15 +5,15 @@ description: All the files that are created and managed by Kuzu on disk.
 
 ## Database files
 
-Kuzu uses a single-file format for persisting the data to disk, i.e., the database is stored in a single file rather than a directory of files.
+Kuzu uses a single-file format for persisting the data to disk. That is, the database is stored in a single file rather than a directory of files.
 The single-file format is also used by other popular embedded databases such as [DuckDB](https://duckdb.org/docs/stable/internals/storage) and [SQLite](https://sqlite.org/fileformat.html).
 
 When opening a Kuzu database in the "on-disk" read-write mode, Kuzu creates the database file at the specified
 path if it doesn't already exist. It can also create other temporary files at runtime. These files are
 automatically cleaned up when the database is closed.
 
 The following table lists the different types of files that are created and managed by Kuzu.
-The database file is the only necessary file to open a database. The rest of the files are created 
+The database file is the only necessary file to open a database. The rest of the files are created
 in the same directory as the database file.
 
 | File Type | Example |