`RTree` index results in significantly slower execution

I load a dataset of ~11.3 M points into DuckDB using the following sql:
```sql
CREATE TABLE my_points AS
    FROM 'input_data.csv';

INSTALL spatial;
LOAD spatial;

ALTER TABLE my_points ADD COLUMN geom GEOMETRY;
UPDATE my_points SET geom = ST_Point(geo_lon, geo_lat);
```
To reproduce, you can download the `input_data.csv` from [here](https://www.kaggle.com/datasets/mrdaniilak/russia-real-estate-2021?resource=download).

Then, I set the threads to one and run the following sql:

```sql
SET threads TO 1;

EXPLAIN ANALYSE SELECT count(*) FROM my_points WHERE ST_Intersects(geom, ST_GeomFromText('POLYGON ((38.364258 56.992883, 35.81543 55.739482, 37.705078 52.975108, 39.155273 54.622978, 44.494629 53.566414, 43.59375 56.255557, 38.364258 56.992883))'));
```
here is the query profile:
```
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││               Total Time: 1.06s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│ ST_Intersects(geom, '\x02 │
│  \x04\x00\x00\x00\x00\x00 │
│  \x00\x00C\x0FB\x82\xE6SB │
│  \x81\xFA1B\xB7\xF8cB\x02 │
│  \x00\x00\x00\x01\x00\x00 │
│  \x00\x07\x00\x00\x00\x00 │
│ \x00\x00\x007\xA7\x92\x01 │
│  \xA0.C@\x8F\xE0F\xCA\x16 │
│  \x7FL@\xB1\x16\x9F\x02`  │
│  \xE8A@\x87\xFD\x9EX\xA7  │
│  \xDEK@\x86\x90\xF3\xFE?  │
│ \xDAB@\xB3\x08\xC5V\xD0|J@│
│ \xD5yT\xFC\xDF\x93C@R\x10<│
│ \xBE\xBDOK@\x9CS\xC9\x00P │
│ ?F@\x8F\xFF\x02A\x80\xC8J@│
│ \x00\x00\x00\x00\x00\xCCE@│
│ \xC7\xA1~\x17\xB6 L@7\xA7 │
│  \x92\x01\xA0.C@\x8F\xE0F │
│ \xCA\x16\x7FL@'::GEOMETRY)│
│                           │
│        2061759 Rows       │
│          (0.96s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│         my_points         │
│                           │
│     Projections: geom     │
│                           │
│       11358150 Rows       │
│          (0.10s)          │
└───────────────────────────┘
```
Next I make an `RTree index` using:
```sql
CREATE INDEX my_idx ON my_points USING RTREE (geom);
```
and I run the same query again. here is the query profile with the index:
```
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││               Total Time: 4.69s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│ ST_Intersects(geom, '\x02 │
│  \x04\x00\x00\x00\x00\x00 │
│  \x00\x00C\x0FB\x82\xE6SB │
│  \x81\xFA1B\xB7\xF8cB\x02 │
│  \x00\x00\x00\x01\x00\x00 │
│  \x00\x07\x00\x00\x00\x00 │
│ \x00\x00\x007\xA7\x92\x01 │
│  \xA0.C@\x8F\xE0F\xCA\x16 │
│  \x7FL@\xB1\x16\x9F\x02`  │
│  \xE8A@\x87\xFD\x9EX\xA7  │
│  \xDEK@\x86\x90\xF3\xFE?  │
│ \xDAB@\xB3\x08\xC5V\xD0|J@│
│ \xD5yT\xFC\xDF\x93C@R\x10<│
│ \xBE\xBDOK@\x9CS\xC9\x00P │
│ ?F@\x8F\xFF\x02A\x80\xC8J@│
│ \x00\x00\x00\x00\x00\xCCE@│
│ \xC7\xA1~\x17\xB6 L@7\xA7 │
│  \x92\x01\xA0.C@\x8F\xE0F │
│ \xCA\x16\x7FL@'::GEOMETRY)│
│                           │
│        2061759 Rows       │
│          (0.52s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│   my_points (RTREE INDEX  │
│       SCAN : my_idx)      │
│                           │
│     Projections: geom     │
│                           │
│        2394763 Rows       │
│          (4.17s)          │
└───────────────────────────┘
```
I am wondering if this a bug in RTree, or I am using the spatial extension wrong.
The other strange thing that I observed is that if I run the same sql when the DuckDB session that is persisted (i.e., DuckDB started with a database file), then the non-indexed query performs the same, but the query that uses the index is even slower. here is a query profile for the same query when the database is persisted:

```
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││               Total Time: 7.58s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│ ST_Intersects(geom, '\x02 │
│  \x04\x00\x00\x00\x00\x00 │
│  \x00\x00C\x0FB\x82\xE6SB │
│  \x81\xFA1B\xB7\xF8cB\x02 │
│  \x00\x00\x00\x01\x00\x00 │
│  \x00\x07\x00\x00\x00\x00 │
│ \x00\x00\x007\xA7\x92\x01 │
│  \xA0.C@\x8F\xE0F\xCA\x16 │
│  \x7FL@\xB1\x16\x9F\x02`  │
│  \xE8A@\x87\xFD\x9EX\xA7  │
│  \xDEK@\x86\x90\xF3\xFE?  │
│ \xDAB@\xB3\x08\xC5V\xD0|J@│
│ \xD5yT\xFC\xDF\x93C@R\x10<│
│ \xBE\xBDOK@\x9CS\xC9\x00P │
│ ?F@\x8F\xFF\x02A\x80\xC8J@│
│ \x00\x00\x00\x00\x00\xCCE@│
│ \xC7\xA1~\x17\xB6 L@7\xA7 │
│  \x92\x01\xA0.C@\x8F\xE0F │
│ \xCA\x16\x7FL@'::GEOMETRY)│
│                           │
│        2061759 Rows       │
│          (0.24s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│   my_points (RTREE INDEX  │
│       SCAN : my_idx)      │
│                           │
│     Projections: geom     │
│                           │
│        2394763 Rows       │
│          (7.34s)          │
└───────────────────────────┘
```
As I have enough RAM that the index can totally fit in the RAM, I don't understand why it's slower when I run it when database is persisted.
If this is a specific scenario where an RTree index is not preferred, I think it might be best to detect it in query planning and don't use an index for that query.

Version of DuckDB: v1.1.2 f680b7d08f

p.s.: I have 128GB of RAM in my computer and I set the threads specifically to 1, otherwise the non-indexed query is even faster cause it processes the data in parallel while the query plan that uses the RTree index is being executed sequentially.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`RTree` index results in significantly slower execution #444

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RTree index results in significantly slower execution #444

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`RTree` index results in significantly slower execution #444