Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions src/content/docs/cypher/query-clauses/call.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The following table lists the built-in schema functions you can use with the `CA
| `SHOW_LOADED_EXTENSIONS` | returns all loaded extensions |
| `SHOW_INDEXES` | returns all indexes built in the system |
| `SHOW_PROJECTED_GRAPHS` | returns all existing projected graphs in the system |
| `PROJECTED_GRAPH_INFO` | returns the given projected graph information |
| `PROJECTED_GRAPH_INFO` | returns the given projected graph information |

</div>

Expand Down Expand Up @@ -179,11 +179,11 @@ CALL show_attached_databases() RETURN *;

`SHOW_WARNINGS` returns the contents of the
[Warnings Table](/import#warnings-table-inspecting-skipped-rows). This is a feature
related to [ignoring errors](/import#ignore-erroneous-rows) when running `COPY/LOAD FROM` statements to scan files.
They will only be reported if the [`IGNORE_ERRORS`](/import#ignore-erroneous-rows) setting is enabled.
related to [ignoring errors](/import#ignore-erroneous-rows) when running `COPY/LOAD FROM` statements to scan files.
They will only be reported if the [`IGNORE_ERRORS`](/import#ignore-erroneous-rows) setting is enabled.
Note that the number of warnings that are stored is limited by the `warning_limit` parameter.
See [configuration](/cypher/configuration#configure-warning-limit) for more details on how to set the warning limit.
After `warning_limit` many warnings are stored, any new warnings generated will not be stored.
After `warning_limit` many warnings are stored, any new warnings generated will not be stored.

| Column | Description | Type |
| ------ | ----------- | ---- |
Expand All @@ -207,7 +207,7 @@ CALL show_warnings() RETURN *;
```

### CLEAR_WARNINGS
If you would like to clear the contents of the [Warnings Table](/import#warnings-table-inspecting-skipped-rows), you can run the `CLEAR_WARNINGS` function.
If you would like to clear the contents of the [Warnings Table](/import#warnings-table-inspecting-skipped-rows), you can run the `CLEAR_WARNINGS` function.
This function has no output.

```cypher
Expand Down Expand Up @@ -315,9 +315,9 @@ CALL SHOW_PROJECTED_GRAPHS() RETURN *;
```

### PROJECTED_GRAPH_INFO
To show the detail information of the projected graph, you can utilize the `PROJECTED_GRAPH_INFO` function.
To show detailed information of the projected graph, you can utilize the `PROJECTED_GRAPH_INFO` function.

There are two types of the projected graph:
There are two types of projected graphs:

### Native projected graph
| Column | Description | Type |
Expand Down Expand Up @@ -363,7 +363,7 @@ call PROJECTED_GRAPH_INFO('student') RETURN *;

## YIELD

The `YIELD` clause in Kuzu is used to rename the return columns of a `CALL` function to avoid naming conflicts and improve readability.
The `YIELD` clause in Kuzu allows renaming the return columns of a `CALL` function to prevent naming conflicts and improve readability.
Usage:
```cypher
CALL FUNC()
Expand Down Expand Up @@ -391,7 +391,7 @@ Another useful scenario is to avoid naming conflicts when two call functions in
CALL table_info('person')
YIELD `property id` as person_id, name as person_name, type as person_type, `default expression` as person_default, `primary key` as person_pk
CALL table_info('student')
YIELD `property id` as student_id, name as student_name, type as student_type, `default expression` as student_default, `primary key` as student_pk
YIELD `property id` as student_id, name as student_name, type as student_type, `default expression` as student_default, `primary key` as student_pk
RETURN *;
```
```
Expand Down
32 changes: 16 additions & 16 deletions src/content/docs/cypher/query-clauses/load-from.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ in the `LOAD FROM` statement using the [`WITH HEADERS`](#bound-variable-names-an
### File format detection
`LOAD FROM` determines the file format based on the file extension if the `file_format` option is not given. For instance, files with a `.csv` extension are automatically recognized as CSV format.

If the file format cannot be inferred from the extension, or if you need to override the default sniffing behavior, the `file_format` option can be used.
If the file format cannot be inferred from the extension or if you need to override the default sniffing behavior, the `file_format` option can be used.

For example, to load a CSV file that has a `.tsv` extension (for tab-separated data), you must explicitly specify the file format using the `file_format` option, as shown below:
```cypher
Expand All @@ -48,18 +48,18 @@ The configurations documented in those pages can also be specified after the `LO
CSV files. For example, you can indicate that the first line should
be interpreted as a header line by setting `(headers = true)` or that the CSV delimiter is '|' by setting `(DELIM="|")`.
Some of these configurations are also by default [automatically detected](/import/csv#csv-configurations) by Kuzu when scanning CSV files.
These configurations determine the names and data types of the
These configurations determine the names and data types of the
variables that bind to the fields scanned from CSV files.
This page does not document those options in detail. We refer you to [CSV Configurations](/import/csv#csv-configurations) and
This page does not document those options in detail. We refer you to [CSV Configurations](/import/csv#csv-configurations) and
[ignore erroneous rows](/import/csv#ignore-erroneous-rows) documentation pages for details.
:::

The syntax for using `LOAD FROM` to scan a CSV file is similar to the one used for using `COPY FROM` with CSV files.
#### CSV header
If (i) the CSV file has a header line, i.e., a first line that should not be interpreted
as a tuple to be scanned; and (ii) `(header = true)` set, then the column names in the first line
provide the names of the columns. The data types are always automatically inferred from the CSV file (except of course
if `LOAD WITH HEADERS (...) FROM` is used, in which case the data types provided inside the `(...)` are used as
as a tuple to be scanned; and (ii) `(header = true)` set, then the column names in the first line
provide the names of the columns. The data types are always automatically inferred from the CSV file (except of course
if `LOAD WITH HEADERS (...) FROM` is used, in which case the data types provided inside the `(...)` are used as
described [above](#bound-variable-names-and-data-types)).

Suppose `user.csv` is a CSV file with the following contents:
Expand Down Expand Up @@ -110,7 +110,7 @@ LOAD FROM "user.csv" (header = false) RETURN *;

Since Parquet files contain schema information in their metadata, Kuzu will always use the available
schema information when loading from Parquet files (except again
if `LOAD WITH HEADERS (...) FROM` is used). Suppose we have a Parquet file `user.parquet` with two columns `f0` and `f1`
if `LOAD WITH HEADERS (...) FROM` is used). Suppose we have a Parquet file `user.parquet` with two columns `f0` and `f1`
and the same content as in the `user.csv` file above. Then the query below will scan the Parquet file and output the following:

```cypher
Expand All @@ -127,7 +127,7 @@ LOAD FROM "user.parquet" RETURN *;

### Pandas

Kuzu allows zero-copy access to Pandas DataFrames. The variable names and data types of scanned columns
Kuzu allows zero-copy access to Pandas DataFrames. The variable names and data types of scanned columns
within a Pandas DataFrame will be
inferred from the schema information of the data frame. Here is an example:

Expand All @@ -136,7 +136,7 @@ inferred from the schema information of the data frame. Here is an example:
import kuzu
import pandas as pd

db = kuzu.Database("persons")
db = kuzu.Database("example.kuzu")
conn = kuzu.Connection(db)

df = pd.DataFrame({
Expand All @@ -162,14 +162,14 @@ Pandas can use either a NumPy or Arrow backend - Kuzu can natively scan from eit

### Polars

Kuzu can also scan Polars DataFrames via the underlying PyArrow layer. The rules for determining the
Kuzu can also scan Polars DataFrames via the underlying PyArrow layer. The rules for determining the
variable names and data types is identical to scanning Pandas data frames. Here is an example:

```python
import kuzu
import polars as pl

db = kuzu.Database("tmp")
db = kuzu.Database("example.kuzu")
conn = kuzu.Connection(db)

df = pl.DataFrame({
Expand Down Expand Up @@ -202,7 +202,7 @@ You can scan an existing PyArrow table as follows:
import kuzu
import pyarrow as pa

db = kuzu.Database("tmp")
db = kuzu.Database("example.kuzu")
conn = kuzu.Connection(db)

pa_table = pa.table({
Expand Down Expand Up @@ -349,7 +349,7 @@ RETURN *;

### Create nodes from input file

You can pass the contents of `LOAD FROM` to a
You can pass the contents of `LOAD FROM` to a `CREATE` statement.

```cypher
// Create a node table
Expand Down Expand Up @@ -431,7 +431,7 @@ casting operation fails.
:::

## Ignore erroneous rows
Errors can happen when scanning different lines or elements of an input file with `LOAD FROM`.
Errors can happen when scanning different lines or elements of an input file with `LOAD FROM`.
Error can happen for several reasons, such as a line in the scanned file is malformed (e.g., in CSV files)
or a field in the scanned line cannot be cast into its expected data type (e.g., due to an integer overflow).
You can skip erroneous lines when scanning large files by setting [`IGNORE_ERRORS`](/import#ignore-erroneous-rows)
Expand All @@ -446,7 +446,7 @@ Bob,2147483650

Suppose we write a `LOAD FROM` statement that tries to read the second column as an INT32.
The second row `(Bob,2147483650)` would be malformed because 2147483650 does not fit into an INT32 and will cause an error.
By setting `IGNORE_ERRORS` to true, instead of erroring, we can make `LOAD FROM` simply skip over this line:
By setting `IGNORE_ERRORS` to true, instead of erroring, we can make `LOAD FROM` simply skip over this line:
```cypher
LOAD WITH HEADERS (name STRING, age INT32) FROM "user.csv" (ignore_errors = true)
RETURN name, age;
Expand All @@ -459,5 +459,5 @@ RETURN name, age;
│ Alice │ 4 │
└────────┴───────┘
```
You can also see the details of any warnings generated by the skipped lines using the [SHOW_WARNINGS](/cypher/query-clauses/call#show_warnings) function.
You can also see the details of any warnings generated by the skipped lines using the [SHOW_WARNINGS](/cypher/query-clauses/call#show_warnings) function.
See the "Ignore erroneous rows" [`section`](/import#ignore-erroneous-rows) of `COPY FROM` for more details.
2 changes: 1 addition & 1 deletion src/content/docs/cypher/transaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ title: Transactions
Kuzu implements a transaction management subsystem that is atomic, durable and supports serializability.
Satisfying these properties makes Kuzu ACID-compliant, as per database terminology.

Every query, data manipulation command, DDL (i.e., new node/rel table schema definitions), or `COPY FROM` command to Kuzu is part of a transaction. Therefore, they exhibit all-or-nothing behavior, and after these commands or a set of them execute and are committed successfully, you are guaranteed that all changes will persist in their entirety. If the commands do not execute successfully or are rolled back, you are guaranteed that none of the changes will persist.
Every query, data manipulation command, DDL (i.e., new node/rel table schema definitions), or `COPY FROM` command to Kuzu is part of a transaction. Therefore, they exhibit all-or-nothing behavior. After one or more of these commands are executed and committed successfully, you are guaranteed that all changes will persist in their entirety. If the commands do not execute successfully or are rolled back, you are guaranteed that none of the changes will persist.

These conditions hold even if your system crashes at any point during a transaction. That is, once you commit a transaction successfully, all your changes will persist even if there is an error *after* committing. Similarly, if your system crashes before committing or rolling back, then none of your updates will persist.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ Logical type refers to how data is conceptually organized. This type doesn't han
actually stored. Different logical types may have the same physical type during storage/query
processing, e.g. both `INT32` and `DATE` logical type will have `INT32` physical type.

During query compilation, i.e., parsing, binding, and planning, the logical type should always be used.
During query compilation, which includes parsing, binding, and planning, the logical type should always be used.

## PhysicalType

Physical type refers to the specific data format as it is physically stored on disk and in memory.

Physical type is useful in storage and query processing, e.g., columns are constructed based on
Physical type is useful in storage and query processing. For example, columns are constructed based on
physical types. Comparison operators can only work on physical types. Using the physical type is not
mandatory -- if you need to make a distinction between logical types, then you will
need to fall back to using a logical type.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We decompose `PhysicalPlan` into `Pipeline`s. A pipeline is a linear sequence of

### Pipeline decomposition

Given a physical plan, we decompose into pipelines when we encounter a sink operator. A sink operator is an operator that must exhaust its input in order to process correctly, e.g. `HASH_JOIN_BUILD`, `AGGREGATE`, `ORDER BY`, etc. Pipelines have dependencies, meaning that one pipeline may depend on the output of another pipeline. E.g., `HASH_JOIN_PROBE` pipeline must depend on a `HASH_JOIN_BUILD` pipeline.
Given a physical plan, we decompose into pipelines when we encounter a sink operator. A sink operator is an operator that must exhaust its input in order to process correctly, e.g. `HASH_JOIN_BUILD`, `AGGREGATE`, `ORDER BY`, etc. Pipelines have dependencies, meaning that one pipeline may depend on the output of another pipeline. For example, `HASH_JOIN_PROBE` pipeline must depend on a `HASH_JOIN_BUILD` pipeline.

### Morsel-driven parallelism

Expand Down
4 changes: 2 additions & 2 deletions src/content/docs/developer-guide/database-internal/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The parser transforms an input string statement into Kuzu's internal AST, called

### Binder

The binder binds a `Statement` into another AST called `BoundStatement`. It also validates the semantic correctness of the input statement and binds the string representation into an internal representation (mostly integer). For example, a table name will be bound to an internal table ID.
The binder binds a `Statement` into another AST called `BoundStatement`. It also validates the semantic correctness of the input statement and binds the string representation into an internal representation (typically an integer). For example, a table name will be bound to an internal table ID.

In addition, the binder also performs a semantic rewrite of `BoundStatement`. Semantic rewrites don't change the semantics of a `BoundStatement` but will rewrite it in a way for performant evaluation. Example: `MATCH (a) WITH a RETURN a.name` will be rewritten as `MATCH (a) RETURN a.name`

Expand Down Expand Up @@ -76,7 +76,7 @@ LogicalPlan
| Planner
BoundStatement
| Binder
Statement
Statement
| Parser
String Input
```
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ title: Vector types
A value vector is Kuzu's column-oriented in-memory data structure to store a chunk of data of the same data type. The size of `ValueVector` is defined by `DEFAULT_VECTOR_CAPACITY`, which is 2048. This is an empirically selected value with the presumption that the data stored in the value vector will fit into CPU cache.

A value vector has the following core fields:
- `data`: Stores the actual data which is a trivial byte array managed by a unique pointer.
- `data`: Stores the actual data which is a trivial byte array managed by a unique pointer.
- `nullMask`: Aligned with `data` and indicates if each entry is `NULL` or not.
- `auxiliaryBuffer`: Keeps track of additional data that does not fit in `data`.

Expand Down
4 changes: 2 additions & 2 deletions src/content/docs/developer-guide/files.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ description: All the files that are created and managed by Kuzu on disk.

## Database files

Kuzu uses a single-file format for persisting the data to disk, i.e., the database is stored in a single file rather than a directory of files.
Kuzu uses a single-file format for persisting the data to disk. That is, the database is stored in a single file rather than a directory of files.
The single-file format is also used by other popular embedded databases such as [DuckDB](https://duckdb.org/docs/stable/internals/storage) and [SQLite](https://sqlite.org/fileformat.html).

When opening a Kuzu database in the "on-disk" read-write mode, Kuzu creates the database file at the specified
path if it doesn't already exist. It can also create other temporary files at runtime. These files are
automatically cleaned up when the database is closed.

The following table lists the different types of files that are created and managed by Kuzu.
The database file is the only necessary file to open a database. The rest of the files are created
The database file is the only necessary file to open a database. The rest of the files are created
in the same directory as the database file.

| File Type | Example |
Expand Down
Loading