-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add new binary data parser #7030
Conversation
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
@srebhan I am also not sure what this mean: |
It is looking for a commit message like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yolkhovyy for the nice update! I have some more comments, but we are definitively coming closer...
## Numeric fields endiannes, "be" or "le", default "be" | ||
# bindata_endiannes = "be" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should be host
, i.e. do not mess with endianess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srebhan Could you please elaborate on this? Not sure whether I understand this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that people tend to not knowing the endianess of data and that's fine as long as the do not cross endianess boarders. If you for example generate the data in a machine (with its endianess) and the parse it on that same machine, you probably do not need to know if that host has big- or little-endian. Therefore I suggest a shortcut for these use-cases like using native
or host
as endianess instead of explicitly defining it. Does that make things clearer?
bindata_time_format = "unix" | ||
|
||
## String encoding - "UTF-8" is the default | ||
bindata_string_encoding = "UTF-8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's save to assume UTF-8 as a default encoding isn't it? If so, please comment this line as shown above.
metricName string | ||
timeFormat string | ||
endiannes string | ||
byteOrder binary.ByteOrder | ||
stringEncoding string | ||
fields []Field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please export these fields. Please also add toml-tags to those options to allow for the new parser format (see PR #8791 and the CSV parser as an example).
plugins/parsers/bindata/parser.go
Outdated
if field.Type != "padding" { | ||
fieldBuffer := data[offset : offset+field.Size] | ||
switch field.Type { | ||
case "string": | ||
fields[field.Name] = string(fieldBuffer) | ||
default: | ||
fieldValue := reflect.New(fieldTypes[field.Type]) | ||
byteReader := bytes.NewReader(fieldBuffer) | ||
binary.Read(byteReader, binData.byteOrder, fieldValue.Interface()) | ||
fields[field.Name] = fieldValue.Elem().Interface() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I'd like to see something like
if field.Type != "padding" { | |
fieldBuffer := data[offset : offset+field.Size] | |
switch field.Type { | |
case "string": | |
fields[field.Name] = string(fieldBuffer) | |
default: | |
fieldValue := reflect.New(fieldTypes[field.Type]) | |
byteReader := bytes.NewReader(fieldBuffer) | |
binary.Read(byteReader, binData.byteOrder, fieldValue.Interface()) | |
fields[field.Name] = fieldValue.Elem().Interface() | |
} | |
} | |
switch field.Type { | |
case "padding": | |
continue | |
case "bool": | |
var v bool | |
r := bytes.NewReader(data[offset : offset+1]) | |
if err := binary.Read(r, binData.byteOrder, &v); err != nil { | |
return nil, err | |
} | |
fields[field.Name] = v | |
case "uint8": | |
var v uint8 | |
r := bytes.NewReader(data[offset : offset+1]) | |
if err := binary.Read(r, binData.byteOrder, &v); err != nil { | |
return nil, err | |
} | |
fields[field.Name] = v | |
case "int8": | |
... | |
case "uint16": | |
... | |
case "int16": | |
... | |
case "uint32": | |
... | |
case "int32": | |
... | |
case "uint64": | |
... | |
case "int64": | |
... | |
case "float32": | |
... | |
case "float64": | |
... | |
case "string": | |
fields[field.Name] = string(data[offset:offset+field.Size) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree with padding inside the switch. Regarding reflection - I thought it was cool :) is reflection against IndluxData style/rules? It is compact and solves endianess nicely.
What bothers me here is the explicit size of the string field. I would go for null-terminated strings here - because it's pretty much standard in embedded - usually coded in c/c++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's probably a matter of taste, but I think the switch/case above is much more readable/understandable compared to reflection. There also might be a performance impact, but that's not my primary concern... So please switch to the switch-statement. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the null-terminated strings. How about, if a length is given, we respect this length, otherwise we go for null-termination. This would allow to read non-null-terminated strings (i.e. fixed length fields) which you sometimes see in embedded devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null-terminated strings - sounds good, good point
@srebhan I am addressing your remarks, good point about bitfields. I am not sure I understand your comment about the "time" field - if it is not present in the spec, it will be initialized to time.Now() at line 194, or, do yo mean something else? func (binData *BinData) getTime(fields map[string]interface{}) (time.Time, error) {
t, found := fields[timeKey]
if !found {
return time.Now(), nil
}
delete(fields, timeKey)
...
} |
📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
@yolkhovyy not sure anymore about my comment regarding time. I think my idea was to explicitly add a user setting saying in which field the time is to be expected instead of implicitly using a field named |
@yolkhovyy can you please resolve the merge conflict and address the remaining comments? We are so close to getting this ready... |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you! |
As discussed, we should take over this one... |
Close in preference of #11552. |
Required for all PRs:
This is an implementation of feature request #6804 - binary data parser.
closes #6804