influxdb write api not correctly handle time field #1020

ttys3 · 2023-06-23T11:54:32Z

Describe this problem

doc issue: seems the document did not explicitly mention that users from inlfuxdb, should use unix mili for the time field.
but influxdb uses nano time for this (ref to https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/ Unix nanosecond timestamp).
the db does not support store nano as time field. but when we do

curl --location 'http://localhost:5440/influxdb/v1/write' \
--header 'Content-Type: text/plain' \
--data 'system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842'

the data is in valid influx line protocol, and the time is always nano, like 1687513104655044842
ceresdb should auto convert this to 1687513104655044842/1000000 or throw an error

ref to https://github.com/CeresDB/ceresdb/blob/9a9c0f79116dff6df2a09fc8766e6952fd5f4432/proxy/src/influxdb/types.rs#L483C47-L483C47

        let timestamp = match line.timestamp {
            Some(ts) => req.precision.try_normalize(ts).context(InternalNoCause {
                msg: "time outside range -9223372036854775806 - 9223372036854775806",
            })?,
            None => Timestamp::now().as_i64(),
        };

but if we only support mili seconds, which will make the write api not fully compatible with influxdb, and even will cause the lost of data. see #1020 (comment)

uses too much memory when query. it use 22GB memory (yes, this crashed my machine once, since it increased too quickly, the OOM is too slow to trigger) in my machine for query like SELECT * FROM "system" order by time desc limit 5; for 10000 * 10000 rows data.

after fixed the write using unix mili, the memory usage decreased to 3GB but this is only because we only got 9981906 rows inserted now (I was so glad at first, but reality drove me into the abyss again).

SELECT MEAN("temperature") FROM "system" GROUP BY time(2m) LIMIT 10

got error:

{
    "code": 500,
    "message": "Failed to handle request, err:Internal error, msg:Failed to execute interpreter, err:Failed to execute select, err:Failed to execute logical plan, err:Failed to collect record batch stream, err:Stream error, msg:convert from arrow record batch, err:External error: Arrow error: Compute error: Overflow happened on: 1687518609978329692 * 1000000"
}

the data is like this:

mysql> SELECT * FROM "system" order by time desc limit 5010000,5;
+----------------------+---------------------+----------+-------------------+---------------+----------+--------+----------------------+---------------+--------------------+--------+
| tsid                 | time                | api_type | disk_free         | disk_total    | hostname | id     | mem_free             | mem_total     | temperature        | vendor |
+----------------------+---------------------+----------+-------------------+---------------+----------+--------+----------------------+---------------+--------------------+--------+
|  8419334968377245646 | 1687519193500451180 | blocking | 884.5372079676212 | 9499000000000 | host_99  | rack_9 |  4732955575044594108 | 9499000000000 |  56.49539751972941 | AWS    |
|  3010413116408139964 | 1687519193500450676 | blocking | 934.4715556549916 | 9499000000000 | host_98  | rack_8 | 15719136822748330796 | 9499000000000 |   9.77510533199514 | AWS    |
|  8291545101785022959 | 1687519193500450157 | blocking | 584.5101426032528 | 9499000000000 | host_97  | rack_7 | 15624829521003432967 | 9499000000000 |  71.64944410901087 | AWS    |
| 16312321373030829732 | 1687519193500449674 | blocking |  251.161030856011 | 9499000000000 | host_96  | rack_6 |  7068395414942882107 | 9499000000000 | 51.326979117765106 | AWS    |
|  8407531866066528452 | 1687519193500449143 | blocking | 367.7945846371885 | 9499000000000 | host_95  | rack_5 | 11735992339675738527 | 9499000000000 |   66.0831191068601 | AWS    |
+----------------------+---------------------+----------+-------------------+---------------+----------+--------+----------------------+---------------+--------------------+--------+
5 rows in set (6.39 sec)

Server version

CeresDB Server 
Version: 1.2.2
Git commit: VERGEN_IDEMPOTENT_OUTPUT
Git branch: VERGEN_IDEMPOTENT_OUTPUT
Opt level: 3
Rustc version: 1.69.0-nightly
Target: x86_64-unknown-linux-gnu
Build date: 2023-05-31T09:30:26.560795195Z

Steps to reproduce

ensure default_timestamp_column_name = "time" config

config file:

[server]
bind_addr = "0.0.0.0"
http_port = 5440
grpc_port = 8831

[server.default_schema_config]
default_timestamp_column_name = "time"

[logger]
level = "info"

[tracing]
dir = "/data"

[analytic.storage.object_store]
type = "Local"
data_dir = "/data"

[analytic.wal]
type = "RocksDB"
data_dir = "/data"

make a first request to create the table

curl --location 'http://localhost:5440/influxdb/v1/write' \
--header 'Content-Type: text/plain' \
--data 'system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842'

the auto created schema should like:

CREATE TABLE `system`
             (
                          `tsid` UINT64 NOT NULL,
                          `time` TIMESTAMP NOT NULL,
                          `api_type` STRING tag,
                          `disk_free` DOUBLE,
                          `disk_total` BIGINT,
                          `hostname` string tag,
                          `id` string tag,
                          `mem_free` uint64,
                          `mem_total` BIGINT,
                          `temperature` DOUBLE,
                          `vendor` string tag,
                          PRIMARY KEY(tsid,time),
                          timestamp KEY(   time)
             )

check what we got:

mysql> SELECT * FROM "system" where mem_free=14855517621364468624 limit 1;
+---------------------+---------------------+----------+-------------------+------------+----------+--------+----------------------+------------+--------------------+--------+
| tsid                | time                | api_type | disk_free         | disk_total | hostname | id     | mem_free             | mem_total  | temperature        | vendor |
+---------------------+---------------------+----------+-------------------+------------+----------+--------+----------------------+------------+--------------------+--------+
| 1698832110102542182 | 1687513104655044842 | blocking | 910.1630013889787 | 7500000000 | host_94  | rack_4 | 14855517621364468624 | 7500000000 | 20.642589504090367 | AWS    |
+---------------------+---------------------+----------+-------------------+------------+----------+--------+----------------------+------------+--------------------+--------+
1 row in set (0.10 sec)

note that, the time is in nano: 1687513104655044842 and without touched.

batch insert data

package main

import (
	"context"
	"fmt"
	"log"
	"math/rand"
	"time"

	ceresdb "github.com/CeresDB/ceresdb-client-go/ceresdb"
	"github.com/cheggaaa/pb/v3"
)

func main() {
	endpoint := "127.0.0.1:8831"
	client, err := ceresdb.NewClient(endpoint, ceresdb.Direct,
		ceresdb.WithDefaultDatabase("public"),
	)
	if err != nil {
		panic(err)
	}
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	ceresdbDemoWriteBlocking(ctx, client)
}

func ceresdbDemoWriteBlocking(ctx context.Context, client ceresdb.Client) {
	batchSize := 5000
	totalRows := 10000 * 10000
	ts := time.Now()
	n := 0
	errNum := 0

	bar := pb.Full.Start(totalRows)
	defer bar.Finish()

	points := make([]ceresdb.Point, 0, 2)

out:
	for i := 0; i < totalRows; i++ {
		// Create point
		point, err := ceresdb.NewPointBuilder("system").
			SetTimestamp(time.Now().UnixNano()).
			AddTag("id", ceresdb.NewStringValue(fmt.Sprintf("rack_%v", i%10))).
			AddTag("vendor", ceresdb.NewStringValue("AWS")).
			AddTag("hostname", ceresdb.NewStringValue(fmt.Sprintf("host_%v", i%100))).
			AddTag("api_type", ceresdb.NewStringValue("blocking")).
			AddField("temperature", ceresdb.NewDoubleValue(rand.Float64()*80.0)).
			AddField("disk_free", ceresdb.NewDoubleValue(rand.Float64()*1000.0)).
			AddField("disk_total", ceresdb.NewInt64Value((int64(i/10+1)*1000000))).
			AddField("mem_total", ceresdb.NewInt64Value((int64(i/100+1)*10000000))).
			AddField("mem_free", ceresdb.NewUint64Value(rand.Uint64())).
			Build()
		if err != nil {
			panic(err)
		}

		// Add point to the batch
		points = append(points, point)

		select {
		case <-ctx.Done():
			break out
		default:

			// Flush the batch if it reaches the batch size
			if len(points) >= batchSize {
				resp, err := writeBatch(ctx, client, points)
				if err != nil {
					errNum += len(points)
					log.Printf("write error: %s", err.Error())
				} else {
					log.Printf("writeBatch success, resp=%+v", resp)
					n += len(points)
				}
				bar.Add(len(points))
				// reset points
				points = points[:0]
			}

		}
	}

	// Flush the remaining points in the batch
	resp, err := writeBatch(ctx, client, points)
	if err != nil {
		errNum += len(points)
		log.Printf("write error: %s", err.Error())
	} else {
		log.Printf("writeBatch success, resp=%+v", resp)
		n += len(points)
	}
	bar.Add(len(points))

	log.Printf("success=%v error=%v cost_ms=%v speed/s=%.2f", n, errNum, time.Now().Sub(ts).Milliseconds(), float64(n)/time.Now().Sub(ts).Seconds())
}

func writeBatch(ctx context.Context, client ceresdb.Client, batchPoints []ceresdb.Point) (ceresdb.WriteResponse, error) {
	req := ceresdb.WriteRequest{
		// ReqCtx: ctx,
		Points: batchPoints,
	}

	resp, err := client.Write(ctx, req)
	if err != nil {
		return resp, err
	}

	return resp, nil
}

query with influxql

http://localhost:5440/influxdb/v1/query?q=SELECT MEAN("temperature") FROM "system" GROUP BY time(2m) LIMIT 10

Expected behavior

query no error
less memory usage (it uses 22GB for something like SELECT * FROM "system" order by time desc limit 5;)

Additional Information

see
#1020 (comment)

and
#1020 (comment)

The text was updated successfully, but these errors were encountered:

ttys3 · 2023-06-23T12:12:55Z

update:
according the error: Arrow error: Compute error: Overflow happened on: 1687518609978329692 * 1000000

I changed the code to use unix mili for the time field

-SetTimestamp(time.Now().UnixNano())
+SetTimestamp(time.Now().UnixMilli())

drop the table and regenerate the demo data (since I can not remove the 1687513104655044842 time record)

the error is gone.

but the /influxdb/v1/write api should fix the compatible issue. as I mention in the issue above.

ShiKaiWi · 2023-06-23T12:48:53Z

@ttys3 Thanks for your report. After the vacation, I guess @jiacai2050 will help solve this issue.

ttys3 · 2023-06-23T13:44:14Z

update again:

if I use mili seconds in time field, it will make Overflow error disappear, but also caused most of the data missing (due to mili seconds the same)

mysql> SELECT count(*) FROM "system";
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
|         9981906 |
+-----------------+
1 row in set (1.50 sec)

but in influxdb, this is OK:

q=SELECT count(*) FROM "system"

we got the expected result:

{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "system",
                    "columns": [
                        "time",
                        "count_disk_free",
                        "count_disk_total",
                        "count_mem_free",
                        "count_mem_total",
                        "count_temperature"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            100000000,
                            100000000,
                            100000000,
                            100000000,
                            100000000
                        ]
                    ]
                }
            ]
        }
    ]
}

jiacai2050 · 2023-06-24T11:16:10Z

Hi, thanks for reporting.

For your case 1 and 2, currently only ms precision is supported by CeresDB, other precision will convert to ms when write.

But in order to convert time, users need to set precision param, which default to ms. So your case 2 could be fixed this way

curl --location 'http://localhost:5440/influxdb/v1/write?precision=ns' \
--header 'Content-Type: text/plain' \
--data 'system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842'

Of course this will lost time precision if converted from ns, but this is how ceresdb is implemented now, if you want ns precision, welcome to open another issue, tell us your usercase.

PS: I file an issue to update this docs. https://github.com/CeresDB/docs/issues/94

For your case 3, I need more time to reproduce this, will update here when I have more context.

ttys3 · 2023-06-24T16:51:15Z

case 3 is reproducable.

I tested again, this time it uses about 19.7GB memory:

mysql> SELECT * FROM "system" order by time desc limit 5;
+----------------------+---------------+----------+--------------------+---------------+----------+--------+----------------------+----------------+--------------------+--------+
| tsid                 | time          | api_type | disk_free          | disk_total    | hostname | id     | mem_free             | mem_total      | temperature        | vendor |
+----------------------+---------------+----------+--------------------+---------------+----------+--------+----------------------+----------------+--------------------+--------+
| 16783440087510120366 | 1790316217186 | blocking |  575.0676008151054 | 9999996000000 | host_58  | rack_8 |  8613071225237535455 | 10000000000000 | 37.376148430256194 | AWS    |
| 14903144881462136102 | 1790316057393 | blocking | 185.71644931843477 | 9999997000000 | host_60  | rack_0 | 12336037372503751218 | 10000000000000 |  15.41807464592409 | AWS    |
| 18036101797844328737 | 1790315980444 | blocking |  323.1994394796841 | 9999948000000 | host_71  | rack_1 |  5277818915129357015 |  9999950000000 | 3.0270463604499485 | AWS    |
|  3146672079379267971 | 1790315975444 | blocking |  712.1429296004604 | 9999981000000 | host_8   | rack_8 | 17916653639169366056 |  9999990000000 | 50.668711268872215 | AWS    |
|  7443242848413171290 | 1790315969714 | blocking |  762.1131729883782 | 9999942000000 | host_15  | rack_5 |  9737001516229547025 |  9999950000000 |  17.12921532169998 | AWS    |
+----------------------+---------------+----------+--------------------+---------------+----------+--------+----------------------+----------------+--------------------+--------+
5 rows in set (5.65 sec)

and, it generate lots of small sst files, is this expected?

yes, I use iotop to check and wait the compact done.

❯ ls -1 /var/lib/ceresdb/store/2/2199023255553 | wc -l
17881

❯ sudo du -sh /var/lib/ceresdb/store/2/2199023255553
4.5G	/var/lib/ceresdb/store/2/2199023255553

.rw-r--r-- root root 328 KB Sun Jun 25 00:21:19 2023  9865.sst
.rw-r--r-- root root 343 KB Sun Jun 25 00:21:19 2023  9866.sst
.rw-r--r-- root root 337 KB Sun Jun 25 00:21:19 2023  9867.sst

jiacai2050 · 2023-06-25T02:27:49Z

and, it generate lots of small sst files, is this expected?

This depends on table's write buffer(default 32M), however there is a bug in 1.2.0, which cause flush too frequently, would you mind try latest version? we have nightly image here:

https://github.com/CeresDB/ceresdb/pkgs/container/ceresdb-server

docker pull ghcr.io/ceresdb/ceresdb-server:nightly-20230625-9a9c0f79

As for high memory consumption, it's mostly caused by too many small sst files, since there is a merge sort process inside one query, which will read head of all sst files. Related issue:

ttys3 · 2023-06-26T03:12:50Z

tried latest ghcr.io/ceresdb/ceresdb-server:nightly-20230626-03d9aa49 and ghcr.io/ceresdb/ceresdb-server:nightly-20230625-9a9c0f79

sst file still small:

  264.0 KiB [################  ]  175.sst
  264.0 KiB [################  ]  243.sst
  264.0 KiB [################  ]  134.sst
  264.0 KiB [################  ]  80.sst
  264.0 KiB [################  ]  84.sst
  264.0 KiB [################  ]  79.sst
  264.0 KiB [################  ]  269.sst
  264.0 KiB [################  ]  394.sst
  264.0 KiB [################  ]  150.sst

write speed becomes slower and slower

ShiKaiWi · 2023-06-26T10:24:06Z

tried latest ghcr.io/ceresdb/ceresdb-server:nightly-20230626-03d9aa49 and ghcr.io/ceresdb/ceresdb-server:nightly-20230625-9a9c0f79

sst file still small:
  264.0 KiB [################  ]  175.sst
  264.0 KiB [################  ]  243.sst
  264.0 KiB [################  ]  134.sst
  264.0 KiB [################  ]  80.sst
  264.0 KiB [################  ]  84.sst
  264.0 KiB [################  ]  79.sst
  264.0 KiB [################  ]  269.sst
  264.0 KiB [################  ]  394.sst
  264.0 KiB [################  ]  150.sst
write speed becomes slower and slower

Actually, we found the massive small sst will be generated too when large small string or null values are inserted. The problem is introduced by the current memtable implementation, and #1029 is one solution to it. As for the null values, another fix I'm working on will use bitmap to reduce the memory consumption.

Finally, we are planning to replace the current memtable with a brand new one, but the work is still in design stage and will take some time to be delivered.

ttys3 · 2023-06-26T16:07:40Z

so, curently, sst flush when table's write buffer(default 32M) is full.

currently the test data is like

system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842

which has string "blocking", "host_xx", "rack_xx" and "AWS" so this is the reason why sst flush so fast ?

this seems not too much string

jiacai2050 · 2023-06-27T02:17:55Z

I will try to reproduce your case 3 today, I have one question for your code:

SetTimestamp(time.Now().UnixNano()).

It seems you use ns to set timestamp, do you change to ms in your later tests?

jiacai2050 · 2023-06-27T06:43:25Z

Hi, after some tests on my local dev, I found why there are so many small sst in your benchmark.

Your original code have lots of duplicated rows since you have code like this

AddTag("id", ceresdb.NewStringValue(fmt.Sprintf("rack_%v", i%10))).
AddTag("hostname", ceresdb.NewStringValue(fmt.Sprintf("host_%v", i%100)))

For now, memtable will save all duplicated rows in memory(tracked in #1035), and do the dedup when flush to SST, so you have got lots of small SST file.

After change your code like this, SST size is expected.

point, err := ceresdb.NewPointBuilder("system").
    SetTimestamp(time.Now().UnixNano()/1e6).
    AddTag("id", ceresdb.NewStringValue(fmt.Sprintf("rack_%v", i))).
    AddTag("vendor", ceresdb.NewStringValue("AWS")).
    AddTag("hostname", ceresdb.NewStringValue(fmt.Sprintf("host_%v", i))).
    AddTag("api_type", ceresdb.NewStringValue("blocking")).
    AddField("temperature", ceresdb.NewDoubleValue(rand.Float64()*80.0)).
    AddField("disk_free", ceresdb.NewDoubleValue(rand.Float64()*1000.0)).
    AddField("disk_total", ceresdb.NewInt64Value((int64(i/10+1)*1000000))).
    AddField("mem_total", ceresdb.NewInt64Value((int64(i/100+1)*10000000))).
    AddField("mem_free", ceresdb.NewUint64Value(rand.Uint64())).
    Build()

$ll -ltrh /tmp/ceresdb/store/2/2199023255553/
total 103M
-rw-r--r-- 1 chenxiang.ljc users 6.7M Jun 27 14:24 1.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 2.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 3.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 4.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 5.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 6.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:25 7.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:25 8.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 9.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 10.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 11.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 12.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 13.sst

jiacai2050 · 2023-06-27T07:16:28Z

After fix the too many small SST issue, I try execute your query

SELECT * FROM "system" order by timestamp desc limit 5;

It will cost memory(res) up to 13G, and we optimize those query in those issue:

jiacai2050 · 2023-08-02T09:46:17Z

Closed since the original issue is explained above, feel free to open new issue when you have more questions.

ttys3 added the bug Something isn't working label Jun 23, 2023

ttys3 changed the title ~~influxql query failed with Arrow error: Compute error: Overflow happened ...~~ influxql write api not correctly handle time field Jun 23, 2023

ttys3 changed the title ~~influxql write api not correctly handle time field~~ influxdb write api not correctly handle time field Jun 23, 2023

jiacai2050 mentioned this issue Jun 27, 2023

Memtable support dedup rows with same primary keys #1035

Closed

jiacai2050 closed this as completed Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

influxdb write api not correctly handle time field #1020

influxdb write api not correctly handle time field #1020

ttys3 commented Jun 23, 2023 •

edited

Loading

ttys3 commented Jun 23, 2023 •

edited

Loading

ShiKaiWi commented Jun 23, 2023

ttys3 commented Jun 23, 2023

jiacai2050 commented Jun 24, 2023 •

edited

Loading

ttys3 commented Jun 24, 2023 •

edited

Loading

jiacai2050 commented Jun 25, 2023 •

edited

Loading

ttys3 commented Jun 26, 2023

ShiKaiWi commented Jun 26, 2023 •

edited

Loading

ttys3 commented Jun 26, 2023 •

edited

Loading

jiacai2050 commented Jun 27, 2023

jiacai2050 commented Jun 27, 2023

jiacai2050 commented Jun 27, 2023 •

edited

Loading

jiacai2050 commented Aug 2, 2023

influxdb write api not correctly handle time field #1020

influxdb write api not correctly handle time field #1020

Comments

ttys3 commented Jun 23, 2023 • edited Loading

Describe this problem

Server version

Steps to reproduce

Expected behavior

Additional Information

ttys3 commented Jun 23, 2023 • edited Loading

ShiKaiWi commented Jun 23, 2023

ttys3 commented Jun 23, 2023

jiacai2050 commented Jun 24, 2023 • edited Loading

ttys3 commented Jun 24, 2023 • edited Loading

jiacai2050 commented Jun 25, 2023 • edited Loading

ttys3 commented Jun 26, 2023

ShiKaiWi commented Jun 26, 2023 • edited Loading

ttys3 commented Jun 26, 2023 • edited Loading

jiacai2050 commented Jun 27, 2023

jiacai2050 commented Jun 27, 2023

jiacai2050 commented Jun 27, 2023 • edited Loading

jiacai2050 commented Aug 2, 2023

ttys3 commented Jun 23, 2023 •

edited

Loading

ttys3 commented Jun 23, 2023 •

edited

Loading

jiacai2050 commented Jun 24, 2023 •

edited

Loading

ttys3 commented Jun 24, 2023 •

edited

Loading

jiacai2050 commented Jun 25, 2023 •

edited

Loading

ShiKaiWi commented Jun 26, 2023 •

edited

Loading

ttys3 commented Jun 26, 2023 •

edited

Loading

jiacai2050 commented Jun 27, 2023 •

edited

Loading