Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 42 additions & 54 deletions src/content/docs/client-apis/python.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,15 @@ and is more convenient to use.
The synchronous API is the default and is a common way to work with Kuzu in Python.

```python
import shutil

import kuzu

def main() -> None:
shutil.rmtree("./test_db", ignore_errors=True)

# Create an empty on-disk database and connect to it
db = kuzu.Database("./demo_db")
db = kuzu.Database("./test_db")
conn = kuzu.Connection(db)

# Create schema
Expand All @@ -53,6 +57,9 @@ def main() -> None:
)
for row in response:
print(row)

if __name__ == "__main__":
main()
```

</TabItem>
Expand All @@ -69,7 +76,8 @@ import shutil
import kuzu

shutil.rmtree("test_db", ignore_errors=True)
db = kuzu.Database("test_db")

db = kuzu.Database("./test_db")
# Create the async connection
# The underlying connection pool will be automatically created and managed by the async connection
conn = kuzu.AsyncConnection(db, max_concurrent_queries=4)
Expand All @@ -88,51 +96,23 @@ async def copy_data(conn: kuzu.AsyncConnection) -> None:
await conn.execute("COPY Follows FROM 'example_data/follows.csv'")
await conn.execute("COPY LivesIn FROM 'example_data/lives-in.csv'")

async def query_1(conn: kuzu.AsyncConnection) -> None:
async def query(conn: kuzu.AsyncConnection) -> None:
result = await conn.execute("MATCH (u:User)-[:LivesIn]->(c:City) RETURN u.*")
for row in result:
print(row)

async def main():
await create_tables(conn)
await copy_data(conn)
# Run queries
await query_1(conn)
await query(conn)

if __name__ == "__main__":
asyncio.run(main())
```

The async API in Python is backed by a thread pool. The thread pool is automatically
created and managed by the async connection -- all you need to do is pass in the `max_concurrent_queries`
parameter to the async connection constructor.

:::caution[Note]
The following features are not yet supported in the async API:
- **`COPY FROM` or `LOAD FROM` with a Pandas or Polars DataFrame, or PyArrow Table**

When using the async API, you may currently encounter the following error:
```
Binder exception: Variable df is not in scope.
```

This is due to the fact that async connections execute queries using a separate thread from the
calling thread, and thus the dataframe isn't in scope for the executing thread. This will be handled
appropriately and addressed in a future release.

- **Projected Graphs in `vector` and `algo` extensions**

When using the `PROJECT_GRAPH` feature with the async API, you may encounter the following error:

```
Binder exception: Cannot find graph projected_graph.
```

This error is due to the current design of projected graphs, which binds each projected graph to a specific Connection instance.
The async API, however, executes queries using a connection pool, meaning that queries may be handled by different connections.
As a result, a projected graph created in one connection may not be accessible from another. This is a known limitation
and will be addressed in a future release.
:::
created and managed by the async connection. You can configure the number of concurrent
queries by setting the `max_concurrent_queries` parameter, as shown above.

</TabItem>

Expand Down Expand Up @@ -178,7 +158,7 @@ You can output the results of a Cypher query to a Pandas DataFrame using the `ge
import kuzu
import pandas as pd

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
Expand Down Expand Up @@ -219,7 +199,7 @@ You can output the results of a Cypher query to a Polars DataFrame using the `ge
import kuzu
import polars as pl

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
Expand Down Expand Up @@ -272,7 +252,7 @@ You can output the results of a Cypher query to a PyArrow Table using the `get_a
import kuzu
import pyarrow as pa

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
Expand All @@ -287,8 +267,10 @@ Using the `get_as_arrow()` method on your query result returns the result as a P
```
pyarrow.Table
p.name: string
p.age: int64
----
p.name: [["Adam","Karissa","Zhang"]]
p.age: [[30,40,50]]
```
</TabItem>

Expand All @@ -308,7 +290,7 @@ Scanning a DataFrame or Table does *not* copy the data into Kuzu, it only reads
import kuzu
import pandas as pd

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

df = pd.DataFrame({
Expand All @@ -334,7 +316,7 @@ Using the `get_as_df()` method on your query result returns the result as a Pand
import kuzu
import polars as pl

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

df = pl.DataFrame({
Expand Down Expand Up @@ -366,7 +348,7 @@ shape: (3, 2)
import kuzu
import pyarrow as pa

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

tbl = pa.table({
Expand Down Expand Up @@ -402,7 +384,7 @@ Copy from a Pandas DataFrame into a Kuzu table using the `COPY FROM` command:
import kuzu
import pandas as pd

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
Expand Down Expand Up @@ -434,7 +416,7 @@ Copy from a Polars DataFrame into a Kuzu table using the `COPY FROM` command:
import kuzu
import polars as pl

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
Expand Down Expand Up @@ -472,7 +454,7 @@ Copy from a PyArrow Table into a Kuzu table using the `COPY FROM` command:
import kuzu
import pyarrow as pa

db = kuzu.Database("tmp")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
Expand Down Expand Up @@ -540,7 +522,7 @@ difference between two numbers, and then apply it in a Cypher query.
```py
import kuzu

db = kuzu.Database("test_db")
db = kuzu.Database(":memory:")
conn = kuzu.Connection(db)

# define your function
Expand Down Expand Up @@ -588,18 +570,15 @@ Details on how to denote types are in the [type notation](#type-notation) sectio
Once the UDF is registered, you can apply it in a Cypher query. First, let's create some data.

```py
# create a table
conn.execute("CREATE NODE TABLE IF NOT EXISTS Item (id INT64 PRIMARY KEY, a INT64, b INT64, c INT64)")

# insert some data
conn.execute("CREATE (i:Item {id: 1}) SET i.a = 134, i.b = 123")
conn.execute("CREATE (i:Item {id: 2}) SET i.a = 44, i.b = 29")
conn.execute("CREATE (i:Item {id: 3}) SET i.a = 32, i.b = 68")
```

We're now ready to apply the UDF in a Cypher query:
```py
# apply the UDF and print the results
result = conn.execute("MATCH (i:Item) RETURN i.a AS a, i.b AS b, difference (i.a, i.b) AS difference")
print(result.get_as_df())
```
Expand All @@ -616,8 +595,7 @@ The output should be:
In case you want to remove the UDF, you can call the `remove_function` method on the connection object.

```py
# Use existing connection object
conn.remove_function(difference)
conn.remove_function("difference")
```

### Nested and complex types
Expand All @@ -642,21 +620,31 @@ def calculate_discounted_price(price: float, has_discount: bool) -> float:
# Assume 10% discount on all items for simplicity
return float(price) * 0.9 if has_discount else price

# define the expected type of the UDF's parameters
parameters = ['DECIMAL(7, 2)', kuzu.Type.BOOL]

# define expected type of the UDF's returned value
return_type = 'DECIMAL(7, 2)'

# register the UDF
conn.create_function(
"current_price",
calculate_discounted_price,
parameters,
return_type
)

result = conn.execute(
"""
RETURN
current_price(100, true) AS discount,
current_price(100, false) AS no_discount;
"""
)
print(result.get_as_df())
```
```
discount no_discount
0 90.00 100.00
```

The second parameter is a built-in native type in Kuzu, i.e., `kuzu.Type.BOOL`. For the first parameter,
we need to specify a string, i.e. `"DECIMAL(7,2)"` that's then parsed and used by the binder in Kuzu
to map to the internal Decimal representation.
we need to specify a string, i.e. `DECIMAL(7,2)` that is then parsed and used by Kuzu
to map to the internal decimal representation.
13 changes: 13 additions & 0 deletions src/content/docs/extensions/algo/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,19 @@ A projected graph is evaluated _only_ when an algorithm is executed.
Kuzu does not materialize projected graphs in memory, and the corresponding data
is scanned from disk on the fly.

:::caution[Note]
When creating projected graphs with the Python async API, you may encounter the following error:

```
Binder exception: Cannot find graph filtered_graph.
```

This error is because projected graphs are bound to a specific `Connection` instance, as mentioned above.
The async API, however, executes queries using a connection pool. Depending on which connection is used
to execute a query, the projected graph may not be available.
This is a known limitation and will be addressed in a future release.
:::

## Edge direction

In Kuzu, both the base graph and projected graphs are directed. For algorithms that are only
Expand Down
Loading