Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

influxdb write api not correctly handle time field #1020

Closed
ttys3 opened this issue Jun 23, 2023 · 13 comments
Closed

influxdb write api not correctly handle time field #1020

ttys3 opened this issue Jun 23, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@ttys3
Copy link

ttys3 commented Jun 23, 2023

Describe this problem

  1. doc issue: seems the document did not explicitly mention that users from inlfuxdb, should use unix mili for the time field.
    but influxdb uses nano time for this (ref to https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/ Unix nanosecond timestamp).

  2. the db does not support store nano as time field. but when we do

curl --location 'http://localhost:5440/influxdb/v1/write' \
--header 'Content-Type: text/plain' \
--data 'system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842'

the data is in valid influx line protocol, and the time is always nano, like 1687513104655044842
ceresdb should auto convert this to 1687513104655044842/1000000 or throw an error

ref to https://github.com/CeresDB/ceresdb/blob/9a9c0f79116dff6df2a09fc8766e6952fd5f4432/proxy/src/influxdb/types.rs#L483C47-L483C47

        let timestamp = match line.timestamp {
            Some(ts) => req.precision.try_normalize(ts).context(InternalNoCause {
                msg: "time outside range -9223372036854775806 - 9223372036854775806",
            })?,
            None => Timestamp::now().as_i64(),
        };

but if we only support mili seconds, which will make the write api not fully compatible with influxdb, and even will cause the lost of data. see #1020 (comment)

  1. uses too much memory when query. it use 22GB memory (yes, this crashed my machine once, since it increased too quickly, the OOM is too slow to trigger) in my machine for query like SELECT * FROM "system" order by time desc limit 5; for 10000 * 10000 rows data.

after fixed the write using unix mili, the memory usage decreased to 3GB but this is only because we only got 9981906 rows inserted now (I was so glad at first, but reality drove me into the abyss again).


SELECT MEAN("temperature") FROM "system" GROUP BY time(2m) LIMIT 10

got error:

{
    "code": 500,
    "message": "Failed to handle request, err:Internal error, msg:Failed to execute interpreter, err:Failed to execute select, err:Failed to execute logical plan, err:Failed to collect record batch stream, err:Stream error, msg:convert from arrow record batch, err:External error: Arrow error: Compute error: Overflow happened on: 1687518609978329692 * 1000000"
}

the data is like this:

mysql> SELECT * FROM "system" order by time desc limit 5010000,5;
+----------------------+---------------------+----------+-------------------+---------------+----------+--------+----------------------+---------------+--------------------+--------+
| tsid                 | time                | api_type | disk_free         | disk_total    | hostname | id     | mem_free             | mem_total     | temperature        | vendor |
+----------------------+---------------------+----------+-------------------+---------------+----------+--------+----------------------+---------------+--------------------+--------+
|  8419334968377245646 | 1687519193500451180 | blocking | 884.5372079676212 | 9499000000000 | host_99  | rack_9 |  4732955575044594108 | 9499000000000 |  56.49539751972941 | AWS    |
|  3010413116408139964 | 1687519193500450676 | blocking | 934.4715556549916 | 9499000000000 | host_98  | rack_8 | 15719136822748330796 | 9499000000000 |   9.77510533199514 | AWS    |
|  8291545101785022959 | 1687519193500450157 | blocking | 584.5101426032528 | 9499000000000 | host_97  | rack_7 | 15624829521003432967 | 9499000000000 |  71.64944410901087 | AWS    |
| 16312321373030829732 | 1687519193500449674 | blocking |  251.161030856011 | 9499000000000 | host_96  | rack_6 |  7068395414942882107 | 9499000000000 | 51.326979117765106 | AWS    |
|  8407531866066528452 | 1687519193500449143 | blocking | 367.7945846371885 | 9499000000000 | host_95  | rack_5 | 11735992339675738527 | 9499000000000 |   66.0831191068601 | AWS    |
+----------------------+---------------------+----------+-------------------+---------------+----------+--------+----------------------+---------------+--------------------+--------+
5 rows in set (6.39 sec)

Server version

CeresDB Server 
Version: 1.2.2
Git commit: VERGEN_IDEMPOTENT_OUTPUT
Git branch: VERGEN_IDEMPOTENT_OUTPUT
Opt level: 3
Rustc version: 1.69.0-nightly
Target: x86_64-unknown-linux-gnu
Build date: 2023-05-31T09:30:26.560795195Z

Steps to reproduce

  1. ensure default_timestamp_column_name = "time" config

config file:

[server]
bind_addr = "0.0.0.0"
http_port = 5440
grpc_port = 8831

[server.default_schema_config]
default_timestamp_column_name = "time"

[logger]
level = "info"

[tracing]
dir = "/data"

[analytic.storage.object_store]
type = "Local"
data_dir = "/data"

[analytic.wal]
type = "RocksDB"
data_dir = "/data"
  1. make a first request to create the table
curl --location 'http://localhost:5440/influxdb/v1/write' \
--header 'Content-Type: text/plain' \
--data 'system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842'

the auto created schema should like:

CREATE TABLE `system`
             (
                          `tsid` UINT64 NOT NULL,
                          `time` TIMESTAMP NOT NULL,
                          `api_type` STRING tag,
                          `disk_free` DOUBLE,
                          `disk_total` BIGINT,
                          `hostname` string tag,
                          `id` string tag,
                          `mem_free` uint64,
                          `mem_total` BIGINT,
                          `temperature` DOUBLE,
                          `vendor` string tag,
                          PRIMARY KEY(tsid,time),
                          timestamp KEY(   time)
             )

check what we got:

mysql> SELECT * FROM "system" where mem_free=14855517621364468624 limit 1;
+---------------------+---------------------+----------+-------------------+------------+----------+--------+----------------------+------------+--------------------+--------+
| tsid                | time                | api_type | disk_free         | disk_total | hostname | id     | mem_free             | mem_total  | temperature        | vendor |
+---------------------+---------------------+----------+-------------------+------------+----------+--------+----------------------+------------+--------------------+--------+
| 1698832110102542182 | 1687513104655044842 | blocking | 910.1630013889787 | 7500000000 | host_94  | rack_4 | 14855517621364468624 | 7500000000 | 20.642589504090367 | AWS    |
+---------------------+---------------------+----------+-------------------+------------+----------+--------+----------------------+------------+--------------------+--------+
1 row in set (0.10 sec)

note that, the time is in nano: 1687513104655044842 and without touched.

  1. batch insert data
package main

import (
	"context"
	"fmt"
	"log"
	"math/rand"
	"time"

	ceresdb "github.com/CeresDB/ceresdb-client-go/ceresdb"
	"github.com/cheggaaa/pb/v3"
)

func main() {
	endpoint := "127.0.0.1:8831"
	client, err := ceresdb.NewClient(endpoint, ceresdb.Direct,
		ceresdb.WithDefaultDatabase("public"),
	)
	if err != nil {
		panic(err)
	}
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	ceresdbDemoWriteBlocking(ctx, client)
}

func ceresdbDemoWriteBlocking(ctx context.Context, client ceresdb.Client) {
	batchSize := 5000
	totalRows := 10000 * 10000
	ts := time.Now()
	n := 0
	errNum := 0

	bar := pb.Full.Start(totalRows)
	defer bar.Finish()

	points := make([]ceresdb.Point, 0, 2)

out:
	for i := 0; i < totalRows; i++ {
		// Create point
		point, err := ceresdb.NewPointBuilder("system").
			SetTimestamp(time.Now().UnixNano()).
			AddTag("id", ceresdb.NewStringValue(fmt.Sprintf("rack_%v", i%10))).
			AddTag("vendor", ceresdb.NewStringValue("AWS")).
			AddTag("hostname", ceresdb.NewStringValue(fmt.Sprintf("host_%v", i%100))).
			AddTag("api_type", ceresdb.NewStringValue("blocking")).
			AddField("temperature", ceresdb.NewDoubleValue(rand.Float64()*80.0)).
			AddField("disk_free", ceresdb.NewDoubleValue(rand.Float64()*1000.0)).
			AddField("disk_total", ceresdb.NewInt64Value((int64(i/10+1)*1000000))).
			AddField("mem_total", ceresdb.NewInt64Value((int64(i/100+1)*10000000))).
			AddField("mem_free", ceresdb.NewUint64Value(rand.Uint64())).
			Build()
		if err != nil {
			panic(err)
		}

		// Add point to the batch
		points = append(points, point)

		select {
		case <-ctx.Done():
			break out
		default:

			// Flush the batch if it reaches the batch size
			if len(points) >= batchSize {
				resp, err := writeBatch(ctx, client, points)
				if err != nil {
					errNum += len(points)
					log.Printf("write error: %s", err.Error())
				} else {
					log.Printf("writeBatch success, resp=%+v", resp)
					n += len(points)
				}
				bar.Add(len(points))
				// reset points
				points = points[:0]
			}

		}
	}

	// Flush the remaining points in the batch
	resp, err := writeBatch(ctx, client, points)
	if err != nil {
		errNum += len(points)
		log.Printf("write error: %s", err.Error())
	} else {
		log.Printf("writeBatch success, resp=%+v", resp)
		n += len(points)
	}
	bar.Add(len(points))

	log.Printf("success=%v error=%v cost_ms=%v speed/s=%.2f", n, errNum, time.Now().Sub(ts).Milliseconds(), float64(n)/time.Now().Sub(ts).Seconds())
}

func writeBatch(ctx context.Context, client ceresdb.Client, batchPoints []ceresdb.Point) (ceresdb.WriteResponse, error) {
	req := ceresdb.WriteRequest{
		// ReqCtx: ctx,
		Points: batchPoints,
	}

	resp, err := client.Write(ctx, req)
	if err != nil {
		return resp, err
	}

	return resp, nil
}
  1. query with influxql
http://localhost:5440/influxdb/v1/query?q=SELECT MEAN("temperature") FROM "system" GROUP BY time(2m) LIMIT 10

Expected behavior

  1. query no error
  2. less memory usage (it uses 22GB for something like SELECT * FROM "system" order by time desc limit 5;)

Additional Information

see
#1020 (comment)

and
#1020 (comment)

@ttys3 ttys3 added the bug Something isn't working label Jun 23, 2023
@ttys3
Copy link
Author

ttys3 commented Jun 23, 2023

update:
according the error: Arrow error: Compute error: Overflow happened on: 1687518609978329692 * 1000000

I changed the code to use unix mili for the time field

-SetTimestamp(time.Now().UnixNano())
+SetTimestamp(time.Now().UnixMilli())

drop the table and regenerate the demo data (since I can not remove the 1687513104655044842 time record)

the error is gone.

but the /influxdb/v1/write api should fix the compatible issue. as I mention in the issue above.

@ttys3 ttys3 changed the title influxql query failed with Arrow error: Compute error: Overflow happened ... influxql write api not correctly handle time field Jun 23, 2023
@ttys3 ttys3 changed the title influxql write api not correctly handle time field influxdb write api not correctly handle time field Jun 23, 2023
@ShiKaiWi
Copy link
Member

@ttys3 Thanks for your report. After the vacation, I guess @jiacai2050 will help solve this issue.

@ttys3
Copy link
Author

ttys3 commented Jun 23, 2023

update again:

if I use mili seconds in time field, it will make Overflow error disappear, but also caused most of the data missing (due to mili seconds the same)

mysql> SELECT count(*) FROM "system";
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
|         9981906 |
+-----------------+
1 row in set (1.50 sec)

but in influxdb, this is OK:

q=SELECT count(*) FROM "system"

we got the expected result:

{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "system",
                    "columns": [
                        "time",
                        "count_disk_free",
                        "count_disk_total",
                        "count_mem_free",
                        "count_mem_total",
                        "count_temperature"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            100000000,
                            100000000,
                            100000000,
                            100000000,
                            100000000
                        ]
                    ]
                }
            ]
        }
    ]
}

@jiacai2050
Copy link
Contributor

jiacai2050 commented Jun 24, 2023

Hi, thanks for reporting.

For your case 1 and 2, currently only ms precision is supported by CeresDB, other precision will convert to ms when write.

But in order to convert time, users need to set precision param, which default to ms. So your case 2 could be fixed this way

curl --location 'http://localhost:5440/influxdb/v1/write?precision=ns' \
--header 'Content-Type: text/plain' \
--data 'system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842'

Of course this will lost time precision if converted from ns, but this is how ceresdb is implemented now, if you want ns precision, welcome to open another issue, tell us your usercase.

PS: I file an issue to update this docs. https://github.com/CeresDB/docs/issues/94

For your case 3, I need more time to reproduce this, will update here when I have more context.

@ttys3
Copy link
Author

ttys3 commented Jun 24, 2023

case 3 is reproducable.

I tested again, this time it uses about 19.7GB memory:

mysql> SELECT * FROM "system" order by time desc limit 5;
+----------------------+---------------+----------+--------------------+---------------+----------+--------+----------------------+----------------+--------------------+--------+
| tsid                 | time          | api_type | disk_free          | disk_total    | hostname | id     | mem_free             | mem_total      | temperature        | vendor |
+----------------------+---------------+----------+--------------------+---------------+----------+--------+----------------------+----------------+--------------------+--------+
| 16783440087510120366 | 1790316217186 | blocking |  575.0676008151054 | 9999996000000 | host_58  | rack_8 |  8613071225237535455 | 10000000000000 | 37.376148430256194 | AWS    |
| 14903144881462136102 | 1790316057393 | blocking | 185.71644931843477 | 9999997000000 | host_60  | rack_0 | 12336037372503751218 | 10000000000000 |  15.41807464592409 | AWS    |
| 18036101797844328737 | 1790315980444 | blocking |  323.1994394796841 | 9999948000000 | host_71  | rack_1 |  5277818915129357015 |  9999950000000 | 3.0270463604499485 | AWS    |
|  3146672079379267971 | 1790315975444 | blocking |  712.1429296004604 | 9999981000000 | host_8   | rack_8 | 17916653639169366056 |  9999990000000 | 50.668711268872215 | AWS    |
|  7443242848413171290 | 1790315969714 | blocking |  762.1131729883782 | 9999942000000 | host_15  | rack_5 |  9737001516229547025 |  9999950000000 |  17.12921532169998 | AWS    |
+----------------------+---------------+----------+--------------------+---------------+----------+--------+----------------------+----------------+--------------------+--------+
5 rows in set (5.65 sec)

and, it generate lots of small sst files, is this expected?

yes, I use iotop to check and wait the compact done.

❯ ls -1 /var/lib/ceresdb/store/2/2199023255553 | wc -l
17881

❯ sudo du -sh /var/lib/ceresdb/store/2/2199023255553
4.5G	/var/lib/ceresdb/store/2/2199023255553
.rw-r--r-- root root 328 KB Sun Jun 25 00:21:19 2023  9865.sst
.rw-r--r-- root root 343 KB Sun Jun 25 00:21:19 2023  9866.sst
.rw-r--r-- root root 337 KB Sun Jun 25 00:21:19 2023  9867.sst

@jiacai2050
Copy link
Contributor

jiacai2050 commented Jun 25, 2023

and, it generate lots of small sst files, is this expected?

This depends on table's write buffer(default 32M), however there is a bug in 1.2.0, which cause flush too frequently, would you mind try latest version? we have nightly image here:

docker pull ghcr.io/ceresdb/ceresdb-server:nightly-20230625-9a9c0f79

As for high memory consumption, it's mostly caused by too many small sst files, since there is a merge sort process inside one query, which will read head of all sst files. Related issue:

@ttys3
Copy link
Author

ttys3 commented Jun 26, 2023

tried latest ghcr.io/ceresdb/ceresdb-server:nightly-20230626-03d9aa49 and ghcr.io/ceresdb/ceresdb-server:nightly-20230625-9a9c0f79

sst file still small:

  264.0 KiB [################  ]  175.sst
  264.0 KiB [################  ]  243.sst
  264.0 KiB [################  ]  134.sst
  264.0 KiB [################  ]  80.sst
  264.0 KiB [################  ]  84.sst
  264.0 KiB [################  ]  79.sst
  264.0 KiB [################  ]  269.sst
  264.0 KiB [################  ]  394.sst
  264.0 KiB [################  ]  150.sst

write speed becomes slower and slower

@ShiKaiWi
Copy link
Member

ShiKaiWi commented Jun 26, 2023

tried latest ghcr.io/ceresdb/ceresdb-server:nightly-20230626-03d9aa49 and ghcr.io/ceresdb/ceresdb-server:nightly-20230625-9a9c0f79

sst file still small:

  264.0 KiB [################  ]  175.sst
  264.0 KiB [################  ]  243.sst
  264.0 KiB [################  ]  134.sst
  264.0 KiB [################  ]  80.sst
  264.0 KiB [################  ]  84.sst
  264.0 KiB [################  ]  79.sst
  264.0 KiB [################  ]  269.sst
  264.0 KiB [################  ]  394.sst
  264.0 KiB [################  ]  150.sst

write speed becomes slower and slower

Actually, we found the massive small sst will be generated too when large small string or null values are inserted. The problem is introduced by the current memtable implementation, and #1029 is one solution to it. As for the null values, another fix I'm working on will use bitmap to reduce the memory consumption.

Finally, we are planning to replace the current memtable with a brand new one, but the work is still in design stage and will take some time to be delivered.

@ttys3
Copy link
Author

ttys3 commented Jun 26, 2023

so, curently, sst flush when table's write buffer(default 32M) is full.

currently the test data is like

system,api_type=blocking,hostname=host_94,id=rack_4,vendor=AWS disk_free=910.1630013889787,disk_total=7500000000i,mem_free=14855517621364468624u,mem_total=7500000000i,temperature=20.642589504090367 1687513104655044842

which has string "blocking", "host_xx", "rack_xx" and "AWS" so this is the reason why sst flush so fast ?

this seems not too much string

@jiacai2050
Copy link
Contributor

I will try to reproduce your case 3 today, I have one question for your code:

SetTimestamp(time.Now().UnixNano()).

It seems you use ns to set timestamp, do you change to ms in your later tests?

@jiacai2050
Copy link
Contributor

Hi, after some tests on my local dev, I found why there are so many small sst in your benchmark.

Your original code have lots of duplicated rows since you have code like this

AddTag("id", ceresdb.NewStringValue(fmt.Sprintf("rack_%v", i%10))).
AddTag("hostname", ceresdb.NewStringValue(fmt.Sprintf("host_%v", i%100)))

For now, memtable will save all duplicated rows in memory(tracked in #1035), and do the dedup when flush to SST, so you have got lots of small SST file.

After change your code like this, SST size is expected.

point, err := ceresdb.NewPointBuilder("system").
    SetTimestamp(time.Now().UnixNano()/1e6).
    AddTag("id", ceresdb.NewStringValue(fmt.Sprintf("rack_%v", i))).
    AddTag("vendor", ceresdb.NewStringValue("AWS")).
    AddTag("hostname", ceresdb.NewStringValue(fmt.Sprintf("host_%v", i))).
    AddTag("api_type", ceresdb.NewStringValue("blocking")).
    AddField("temperature", ceresdb.NewDoubleValue(rand.Float64()*80.0)).
    AddField("disk_free", ceresdb.NewDoubleValue(rand.Float64()*1000.0)).
    AddField("disk_total", ceresdb.NewInt64Value((int64(i/10+1)*1000000))).
    AddField("mem_total", ceresdb.NewInt64Value((int64(i/100+1)*10000000))).
    AddField("mem_free", ceresdb.NewUint64Value(rand.Uint64())).
    Build()
$ll -ltrh /tmp/ceresdb/store/2/2199023255553/
total 103M
-rw-r--r-- 1 chenxiang.ljc users 6.7M Jun 27 14:24 1.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 2.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 3.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 4.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 5.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:24 6.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:25 7.sst
-rw-r--r-- 1 chenxiang.ljc users 6.8M Jun 27 14:25 8.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 9.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 10.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 11.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 12.sst
-rw-r--r-- 1 chenxiang.ljc users 6.5M Jun 27 14:25 13.sst

@jiacai2050
Copy link
Contributor

jiacai2050 commented Jun 27, 2023

After fix the too many small SST issue, I try execute your query

SELECT * FROM "system" order by timestamp desc limit 5;

It will cost memory(res) up to 13G, and we optimize those query in those issue:

@jiacai2050
Copy link
Contributor

Closed since the original issue is explained above, feel free to open new issue when you have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants