feat: add new binary data parser #7030

yolkhovyy · 2020-02-16T10:52:10Z

Required for all PRs:

This is an implementation of feature request #6804 - binary data parser.

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>

yolkhovyy · 2022-01-09T15:56:28Z

@srebhan I am also not sure what this mean:
Semantic Pull Request — add a semantic commit or PR title
Could you please elaborate on this? Thanks

powersj · 2022-01-10T14:44:01Z

@srebhan I am also not sure what this mean: Semantic Pull Request — add a semantic commit or PR title Could you please elaborate on this? Thanks

It is looking for a commit message like this:

feat: add new binary data parser

srebhan

Thanks @yolkhovyy for the nice update! I have some more comments, but we are definitively coming closer...

srebhan · 2022-01-17T20:29:38Z

plugins/parsers/bindata/README.md

+  ## Numeric fields endiannes, "be" or "le", default "be"
+  # bindata_endiannes = "be"


I think the default should be host, i.e. do not mess with endianess?

@srebhan Could you please elaborate on this? Not sure whether I understand this comment.

What I mean is that people tend to not knowing the endianess of data and that's fine as long as the do not cross endianess boarders. If you for example generate the data in a machine (with its endianess) and the parse it on that same machine, you probably do not need to know if that host has big- or little-endian. Therefore I suggest a shortcut for these use-cases like using native or host as endianess instead of explicitly defining it. Does that make things clearer?

srebhan · 2022-01-17T20:30:39Z

plugins/parsers/bindata/README.md

+  bindata_time_format = "unix"
+
+  ## String encoding - "UTF-8" is the default
+  bindata_string_encoding = "UTF-8"


I think it's save to assume UTF-8 as a default encoding isn't it? If so, please comment this line as shown above.

plugins/parsers/bindata/README.md

srebhan · 2022-01-17T20:34:58Z

plugins/parsers/bindata/parser.go

+	metricName     string
+	timeFormat     string
+	endiannes      string
+	byteOrder      binary.ByteOrder
+	stringEncoding string
+	fields         []Field


Please export these fields. Please also add toml-tags to those options to allow for the new parser format (see PR #8791 and the CSV parser as an example).

plugins/parsers/bindata/parser.go

srebhan · 2022-01-17T20:51:34Z

plugins/parsers/bindata/parser.go

+		if field.Type != "padding" {
+			fieldBuffer := data[offset : offset+field.Size]
+			switch field.Type {
+			case "string":
+				fields[field.Name] = string(fieldBuffer)
+			default:
+				fieldValue := reflect.New(fieldTypes[field.Type])
+				byteReader := bytes.NewReader(fieldBuffer)
+				binary.Read(byteReader, binData.byteOrder, fieldValue.Interface())
+				fields[field.Name] = fieldValue.Elem().Interface()
+			}
+		}


To be honest I'd like to see something like

Suggested change

if field.Type != "padding" {

fieldBuffer := data[offset : offset+field.Size]

switch field.Type {

case "string":

fields[field.Name] = string(fieldBuffer)

default:

fieldValue := reflect.New(fieldTypes[field.Type])

byteReader := bytes.NewReader(fieldBuffer)

binary.Read(byteReader, binData.byteOrder, fieldValue.Interface())

fields[field.Name] = fieldValue.Elem().Interface()

}

}

switch field.Type {

case "padding":

continue

case "bool":

var v bool

r := bytes.NewReader(data[offset : offset+1])

if err := binary.Read(r, binData.byteOrder, &v); err != nil {

return nil, err

}

fields[field.Name] = v

case "uint8":

var v uint8

r := bytes.NewReader(data[offset : offset+1])

if err := binary.Read(r, binData.byteOrder, &v); err != nil {

return nil, err

}

fields[field.Name] = v

case "int8":

...

case "uint16":

...

case "int16":

...

case "uint32":

...

case "int32":

...

case "uint64":

...

case "int64":

...

case "float32":

...

case "float64":

...

case "string":

fields[field.Name] = string(data[offset:offset+field.Size)

}

Totally agree with padding inside the switch. Regarding reflection - I thought it was cool :) is reflection against IndluxData style/rules? It is compact and solves endianess nicely.
What bothers me here is the explicit size of the string field. I would go for null-terminated strings here - because it's pretty much standard in embedded - usually coded in c/c++.

Well it's probably a matter of taste, but I think the switch/case above is much more readable/understandable compared to reflection. There also might be a performance impact, but that's not my primary concern... So please switch to the switch-statement. :-)

Regarding the null-terminated strings. How about, if a length is given, we respect this length, otherwise we go for null-termination. This would allow to read non-null-terminated strings (i.e. fixed length fields) which you sometimes see in embedded devices.

null-terminated strings - sounds good, good point

plugins/parsers/bindata/parser.go

yolkhovyy · 2022-01-23T13:23:54Z

@yolkhovyy this is very cool. I have some comments in the code. The main ones are the specification of the time field. IMO you should make this explicit with a sensible default e.g. "time" or empty == time.Now(). As a (later) extension please think about bitfields which are very common for boolean flags in embedded devices.

@srebhan I am addressing your remarks, good point about bitfields. I am not sure I understand your comment about the "time" field - if it is not present in the spec, it will be initialized to time.Now() at line 194, or, do yo mean something else?

func (binData *BinData) getTime(fields map[string]interface{}) (time.Time, error) {
	t, found := fields[timeKey]
	if !found {
		return time.Now(), nil
	}
	delete(fields, timeKey)
...
}

telegraf-tiger · 2022-01-23T15:23:18Z

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Looks like new artifacts were built from this PR.

Expand this list to get them here ! 🐯

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

srebhan · 2022-01-25T10:19:21Z

@yolkhovyy not sure anymore about my comment regarding time. I think my idea was to explicitly add a user setting saying in which field the time is to be expected instead of implicitly using a field named "Time". One reason is case-sensitivity and the fact that the user "needs to know" this. If there is an option # time = "" in TOML, i.e. an option to specify the time field explicitly I would feel better. The logic is then, if this option is empty (default) we are using time.Now otherwise we use the given field and error out if that field does not exist.
What do you think?

srebhan · 2022-06-01T21:01:17Z

@yolkhovyy can you please resolve the merge conflict and address the remaining comments? We are so close to getting this ready...

telegraf-tiger · 2022-07-21T18:09:58Z

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

srebhan · 2022-07-22T13:09:17Z

As discussed, we should take over this one...

srebhan · 2022-07-26T15:31:08Z

Close in preference of #11552.

yolkhovyy and others added 30 commits October 13, 2019 15:21

Initial binary metric implementation

1a64cbe

Metric time

2c1b466

Metric time

72591fc

Added string type

b76cea3

Re-factored and tests

746e956

More tests

1365ce8

More tests

bc81a4b

Renamed to BinData

7a511f2

Working bindata parser

5421933

Merge branch 'feature/fix_record_parser' into develop

87ca271

Optional field size

eda274e

Added bindata parser description in README.md

6f944e9

README.md updated

adfda86

Removed Protocol

3e85c0b

Fixed typo

ec20ac2

Remove unused const

c300d4f

Merge branch 'master' into develop

7a79c8e

Updated README.md

746ea4d

Reworking string encoding

72f8137

Unit tests for string encoding

4a1c812

Merge branch 'feature/new-string-encoding' into develop

7e9003c

Padding test cleaned up

7b065d7

UTF-8 unit test

e32d8a1

Unit tests cleaned up

ef749af

Merge branch 'feature/utf8-encoding-test' into develop

eb77521

Comments and commented out code

0a30031

README.md updated

c49fc04

../registry.go

54a0cc8

Added bindata factory, reworked unit tests

1ad2998

Comments, etc

b040ed3

yolkhovyy and others added 5 commits January 9, 2022 16:30

Update plugins/parsers/bindata/parser_test.go

275858d

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>

Update plugins/parsers/bindata/parser_test.go

313c538

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>

Fixed unit test

6be9cc2

Removed new lines in tests

52a45a6

Default endiannes

016372b

powersj changed the title ~~Add binary data parser plugin~~ feat: add new binary data parser Jan 10, 2022

srebhan reviewed Jan 17, 2022

View reviewed changes

yolkhovyy added 6 commits January 23, 2022 12:15

Resolved merge conflicts

013c539

Compilation error fixe

d110b54

Reworked getParserConfig in config/config.go

9ab9d4f

Lint errors fixed

a8185c4

Lint errors fixed

d127f83

Lint errors fixed

0b154c2

yolkhovyy added 2 commits January 23, 2022 15:39

Review comments addressed

9a93ace

Review comments addressed

e5ad2d1

sspaink added the waiting for response waiting for response from contributor label Jul 6, 2022

telegraf-tiger bot closed this Jul 21, 2022

srebhan removed the waiting for response waiting for response from contributor label Jul 22, 2022

srebhan reopened this Jul 22, 2022

srebhan mentioned this pull request Jul 26, 2022

feat: Add binary parser #11552

Merged

3 tasks

srebhan closed this Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add new binary data parser #7030

feat: add new binary data parser #7030

yolkhovyy commented Feb 16, 2020 •

edited by danielnelson

Loading

yolkhovyy commented Jan 9, 2022

powersj commented Jan 10, 2022

srebhan left a comment

srebhan Jan 17, 2022

yolkhovyy Apr 21, 2022

srebhan Apr 28, 2022

srebhan Jan 17, 2022

srebhan Jan 17, 2022

srebhan Jan 17, 2022

yolkhovyy Jan 23, 2022 •

edited

Loading

srebhan Jan 25, 2022

srebhan Jan 25, 2022

yolkhovyy May 13, 2022

yolkhovyy commented Jan 23, 2022 •

edited

Loading

telegraf-tiger bot commented Jan 23, 2022

Artifact URLs

srebhan commented Jan 25, 2022

srebhan commented Jun 1, 2022

telegraf-tiger bot commented Jul 21, 2022

srebhan commented Jul 22, 2022

srebhan commented Jul 26, 2022

		## Numeric fields endiannes, "be" or "le", default "be"
		# bindata_endiannes = "be"

-		if field.Type != "padding" {
-			fieldBuffer := data[offset : offset+field.Size]
-			switch field.Type {
-			case "string":
-				fields[field.Name] = string(fieldBuffer)
-			default:
-				fieldValue := reflect.New(fieldTypes[field.Type])
-				byteReader := bytes.NewReader(fieldBuffer)
-				binary.Read(byteReader, binData.byteOrder, fieldValue.Interface())
-				fields[field.Name] = fieldValue.Elem().Interface()
-			}
-		}
+		switch field.Type {
+		case "padding":
+			continue
+		case "bool":
+			var v bool
+			r := bytes.NewReader(data[offset : offset+1])
+			if err := binary.Read(r, binData.byteOrder, &v); err != nil {
+				return nil, err
+			}
+			fields[field.Name] = v
+		case "uint8":
+			var v uint8
+			r := bytes.NewReader(data[offset : offset+1])
+			if err := binary.Read(r, binData.byteOrder, &v); err != nil {
+				return nil, err
+			}
+			fields[field.Name] = v
+		case "int8":
+			...
+		case "uint16":
+			...
+		case "int16":
+			...
+		case "uint32":
+			...
+		case "int32":
+			...
+		case "uint64":
+			...
+		case "int64":
+			...
+		case "float32":
+			...
+		case "float64":
+			...
+		case "string":
+			fields[field.Name] = string(data[offset:offset+field.Size)
+		}

feat: add new binary data parser #7030

feat: add new binary data parser #7030

Conversation

yolkhovyy commented Feb 16, 2020 • edited by danielnelson Loading

Required for all PRs:

yolkhovyy commented Jan 9, 2022

powersj commented Jan 10, 2022

srebhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yolkhovyy Jan 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yolkhovyy commented Jan 23, 2022 • edited Loading

telegraf-tiger bot commented Jan 23, 2022

Artifact URLs

srebhan commented Jan 25, 2022

srebhan commented Jun 1, 2022

telegraf-tiger bot commented Jul 21, 2022

srebhan commented Jul 22, 2022

srebhan commented Jul 26, 2022

yolkhovyy commented Feb 16, 2020 •

edited by danielnelson

Loading

yolkhovyy Jan 23, 2022 •

edited

Loading

yolkhovyy commented Jan 23, 2022 •

edited

Loading