Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new binary data parser #7030

Closed
wants to merge 64 commits into from
Closed

Conversation

yolkhovyy
Copy link

@yolkhovyy yolkhovyy commented Feb 16, 2020

Required for all PRs:

This is an implementation of feature request #6804 - binary data parser.

closes #6804

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

yolkhovyy and others added 5 commits January 9, 2022 16:30
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
@yolkhovyy
Copy link
Author

@srebhan I am also not sure what this mean:
Semantic Pull Request — add a semantic commit or PR title
Could you please elaborate on this? Thanks

@powersj
Copy link
Contributor

powersj commented Jan 10, 2022

@srebhan I am also not sure what this mean: Semantic Pull Request — add a semantic commit or PR title Could you please elaborate on this? Thanks

It is looking for a commit message like this:

feat: add new binary data parser

@powersj powersj changed the title Add binary data parser plugin feat: add new binary data parser Jan 10, 2022
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yolkhovyy for the nice update! I have some more comments, but we are definitively coming closer...

Comment on lines +22 to +23
## Numeric fields endiannes, "be" or "le", default "be"
# bindata_endiannes = "be"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default should be host, i.e. do not mess with endianess?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srebhan Could you please elaborate on this? Not sure whether I understand this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that people tend to not knowing the endianess of data and that's fine as long as the do not cross endianess boarders. If you for example generate the data in a machine (with its endianess) and the parse it on that same machine, you probably do not need to know if that host has big- or little-endian. Therefore I suggest a shortcut for these use-cases like using native or host as endianess instead of explicitly defining it. Does that make things clearer?

bindata_time_format = "unix"

## String encoding - "UTF-8" is the default
bindata_string_encoding = "UTF-8"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's save to assume UTF-8 as a default encoding isn't it? If so, please comment this line as shown above.

plugins/parsers/bindata/README.md Show resolved Hide resolved
Comment on lines +30 to +35
metricName string
timeFormat string
endiannes string
byteOrder binary.ByteOrder
stringEncoding string
fields []Field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please export these fields. Please also add toml-tags to those options to allow for the new parser format (see PR #8791 and the CSV parser as an example).

plugins/parsers/bindata/parser.go Outdated Show resolved Hide resolved
Comment on lines 161 to 172
if field.Type != "padding" {
fieldBuffer := data[offset : offset+field.Size]
switch field.Type {
case "string":
fields[field.Name] = string(fieldBuffer)
default:
fieldValue := reflect.New(fieldTypes[field.Type])
byteReader := bytes.NewReader(fieldBuffer)
binary.Read(byteReader, binData.byteOrder, fieldValue.Interface())
fields[field.Name] = fieldValue.Elem().Interface()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I'd like to see something like

Suggested change
if field.Type != "padding" {
fieldBuffer := data[offset : offset+field.Size]
switch field.Type {
case "string":
fields[field.Name] = string(fieldBuffer)
default:
fieldValue := reflect.New(fieldTypes[field.Type])
byteReader := bytes.NewReader(fieldBuffer)
binary.Read(byteReader, binData.byteOrder, fieldValue.Interface())
fields[field.Name] = fieldValue.Elem().Interface()
}
}
switch field.Type {
case "padding":
continue
case "bool":
var v bool
r := bytes.NewReader(data[offset : offset+1])
if err := binary.Read(r, binData.byteOrder, &v); err != nil {
return nil, err
}
fields[field.Name] = v
case "uint8":
var v uint8
r := bytes.NewReader(data[offset : offset+1])
if err := binary.Read(r, binData.byteOrder, &v); err != nil {
return nil, err
}
fields[field.Name] = v
case "int8":
...
case "uint16":
...
case "int16":
...
case "uint32":
...
case "int32":
...
case "uint64":
...
case "int64":
...
case "float32":
...
case "float64":
...
case "string":
fields[field.Name] = string(data[offset:offset+field.Size)
}

Copy link
Author

@yolkhovyy yolkhovyy Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with padding inside the switch. Regarding reflection - I thought it was cool :) is reflection against IndluxData style/rules? It is compact and solves endianess nicely.
What bothers me here is the explicit size of the string field. I would go for null-terminated strings here - because it's pretty much standard in embedded - usually coded in c/c++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's probably a matter of taste, but I think the switch/case above is much more readable/understandable compared to reflection. There also might be a performance impact, but that's not my primary concern... So please switch to the switch-statement. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the null-terminated strings. How about, if a length is given, we respect this length, otherwise we go for null-termination. This would allow to read non-null-terminated strings (i.e. fixed length fields) which you sometimes see in embedded devices.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null-terminated strings - sounds good, good point

plugins/parsers/bindata/parser.go Outdated Show resolved Hide resolved
@yolkhovyy
Copy link
Author

yolkhovyy commented Jan 23, 2022

@yolkhovyy this is very cool. I have some comments in the code. The main ones are the specification of the time field. IMO you should make this explicit with a sensible default e.g. "time" or empty == time.Now(). As a (later) extension please think about bitfields which are very common for boolean flags in embedded devices.

@srebhan I am addressing your remarks, good point about bitfields. I am not sure I understand your comment about the "time" field - if it is not present in the spec, it will be initialized to time.Now() at line 194, or, do yo mean something else?

func (binData *BinData) getTime(fields map[string]interface{}) (time.Time, error) {
	t, found := fields[timeKey]
	if !found {
		return time.Now(), nil
	}
	delete(fields, timeKey)
...
} 

@srebhan
Copy link
Member

srebhan commented Jan 25, 2022

@yolkhovyy not sure anymore about my comment regarding time. I think my idea was to explicitly add a user setting saying in which field the time is to be expected instead of implicitly using a field named "Time". One reason is case-sensitivity and the fact that the user "needs to know" this. If there is an option # time = "" in TOML, i.e. an option to specify the time field explicitly I would feel better. The logic is then, if this option is empty (default) we are using time.Now otherwise we use the given field and error out if that field does not exist.
What do you think?

@srebhan
Copy link
Member

srebhan commented Jun 1, 2022

@yolkhovyy can you please resolve the merge conflict and address the remaining comments? We are so close to getting this ready...

@sspaink sspaink added the waiting for response waiting for response from contributor label Jul 6, 2022
@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

@telegraf-tiger telegraf-tiger bot closed this Jul 21, 2022
@srebhan srebhan removed the waiting for response waiting for response from contributor label Jul 22, 2022
@srebhan
Copy link
Member

srebhan commented Jul 22, 2022

As discussed, we should take over this one...

@srebhan srebhan reopened this Jul 22, 2022
@srebhan srebhan mentioned this pull request Jul 26, 2022
3 tasks
@srebhan
Copy link
Member

srebhan commented Jul 26, 2022

Close in preference of #11552.

@srebhan srebhan closed this Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Request for community participation, code, contribution new plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary data parser feature request
6 participants