Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse_ints config in json parser to support parsing int or float properly #33699

Merged
merged 11 commits into from
Jul 3, 2024
27 changes: 27 additions & 0 deletions .chloggen/json_parser_number_data_type.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: pkg/stanza

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: expose json iterator config in json parser

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33696]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
42 changes: 32 additions & 10 deletions pkg/stanza/docs/operators/json_parser.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,43 @@ The `json_parser` operator parses the string-type field selected by `parse_from`

### Configuration Fields

| Field | Default | Description |
| --- | --- | --- |
| `id` | `json_parser` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
| `parse_to` | `attributes` | The [field](../types/field.md) to which the value will be parsed. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
| `timestamp` | `nil` | An optional [timestamp](../types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |
| Field | Default | Description |
| --- | --- | --- |
| `id` | `json_parser` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
| `parse_to` | `attributes` | The [field](../types/field.md) to which the value will be parsed. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
| `timestamp` | `nil` | An optional [timestamp](../types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |
| `jsontier_config` | `nil` | An optional jsontier config block. See below for details. |

### Embedded Operations

The `json_parser` can be configured to embed certain operations such as timestamp and severity parsing. For more information, see [complex parsers](../types/parsers.md#complex-parsers).

### `jsoniter_config` Configuration

The `json_parser` operator uses the [json-iterator](https://github.com/json-iterator/go) as the underlying json parser, the default config is [ConfigFastest](https://pkg.go.dev/github.com/json-iterator/go#pkg-variables).

In additional, this `jsoniter_config` block allows you to configure the json parser with a custom configuration. Here are available fields that map to the corresponding fields in [json-iterator Config](https://pkg.go.dev/github.com/json-iterator/go#Config):
newly12 marked this conversation as resolved.
Show resolved Hide resolved

| Field | Default | Description |
|--------------------------------------|---------|----------------------------------------------------|
| `indention_step` | 0 | json-iterator.Config.IndentionStep |
| `marshal_float_with_6_digits` | `false` | json-iterator.Config.MarshalFloatWith6Digits |
| `escape_html` | `false` | json-iterator.Config.EscapeHTML |
| `sort_map_keys` | `false` | json-iterator.Config.SortMapKeys |
| `use_number` | `false` | json-iterator.Config.UseNumber |
| `disallow_unknown_fields` | `false` | json-iterator.Config.DisallowUnknownFields |
| `tag_key` | `` | json-iterator.Config.TagKey |
| `only_tagged_field` | `false` | json-iterator.Config.OnlyTaggedField |
| `validate_json_raw_message` | `false` | json-iterator.Config.ValidateJsonRawMessage |
| `object_field_must_be_simple_string` | `false` | json-iterator.Config.ObjectFieldMustBeSimpleString |
| `case_sensitive` | `false` | json-iterator.Config.CaseSensitive |

numbers like `int` and `float` are parsed as `float64` by default, when `use_number` is enabled, numbers are parsed as `json.Number` and then coverted to `int64` or `float64` based on the value.

### Example Configurations

Expand Down
44 changes: 43 additions & 1 deletion pkg/stanza/operator/parser/json/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,38 @@ func NewConfigWithID(operatorID string) *Config {
// Config is the configuration of a JSON parser operator.
type Config struct {
helper.ParserConfig `mapstructure:",squash"`

*JsoniterConfig `mapstructure:"jsoniter_config,omitempty"`
}

type JsoniterConfig struct {
IndentionStep int `mapstructure:"indention_step"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expose all these options to the users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, my second thought on this is at least at this moment we should only expose options we needed, to not bind to the json iterator so tight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently json_parser uses jsoniter.ConfigFastest which has 2 more options enabled

var ConfigFastest = [Config](https://pkg.go.dev/github.com/json-iterator/go#Config){
	EscapeHTML:                    [false](https://pkg.go.dev/builtin#false),
	MarshalFloatWith6Digits:       [true](https://pkg.go.dev/builtin#true),
	ObjectFieldMustBeSimpleString: [true](https://pkg.go.dev/builtin#true),
}.Froze()

to add the use_number config, it looks like we have to create a new jsoniter.Config object probably to have the same config so users are not surprised, I think it makes sense to allow user to be able to turn off MarshalFloatWith6Digits as well. any thoughts or comments on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should only expose what is needed, and we should consider that perhaps some day we need to switch json parsing libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about MarshalFloatWith6Digits, it doesn't affect unmarshal, only marshal, which is not related to the parser. so for now I only added config for UseNumber.

MarshalFloatWith6Digits bool `mapstructure:"marshal_float_with_6_digits"`
EscapeHTML bool `mapstructure:"escape_html"`
SortMapKeys bool `mapstructure:"sort_map_keys"`
UseNumber bool `mapstructure:"use_number"`
DisallowUnknownFields bool `mapstructure:"disallow_unknown_fields"`
TagKey string `mapstructure:"tag_key"`
OnlyTaggedField bool `mapstructure:"only_tagged_field"`
ValidateJsonRawMessage bool `mapstructure:"validate_json_raw_message"`
newly12 marked this conversation as resolved.
Show resolved Hide resolved
ObjectFieldMustBeSimpleString bool `mapstructure:"object_field_must_be_simple_string"`
CaseSensitive bool `mapstructure:"case_sensitive"`
}

func (jc JsoniterConfig) toJsoniterAPI() jsoniter.API {
return jsoniter.Config{
IndentionStep: jc.IndentionStep,
MarshalFloatWith6Digits: jc.MarshalFloatWith6Digits,
EscapeHTML: jc.EscapeHTML,
SortMapKeys: jc.SortMapKeys,
UseNumber: jc.UseNumber,
DisallowUnknownFields: jc.DisallowUnknownFields,
TagKey: jc.TagKey,
OnlyTaggedField: jc.OnlyTaggedField,
ValidateJsonRawMessage: jc.ValidateJsonRawMessage,
ObjectFieldMustBeSimpleString: jc.ObjectFieldMustBeSimpleString,
CaseSensitive: jc.CaseSensitive,
}.Froze()
}

// Build will build a JSON parser operator.
Expand All @@ -41,8 +73,18 @@ func (c Config) Build(set component.TelemetrySettings) (operator.Operator, error
return nil, err
}

var json jsoniter.API
var convertNumber bool
if c.JsoniterConfig != nil {
json = c.JsoniterConfig.toJsoniterAPI()
convertNumber = c.JsoniterConfig.UseNumber
} else {
json = jsoniter.ConfigFastest
}

return &Parser{
ParserOperator: parserOperator,
json: jsoniter.ConfigFastest,
json: json,
useNumber: convertNumber,
}, nil
}
10 changes: 10 additions & 0 deletions pkg/stanza/operator/parser/json/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,16 @@ func TestConfig(t *testing.T) {
return p
}(),
},
{
Name: "use_number",
Expect: func() *Config {
p := NewConfig()
p.JsoniterConfig = &JsoniterConfig{
UseNumber: true,
}
return p
}(),
},
},
}.Run(t)
}
46 changes: 45 additions & 1 deletion pkg/stanza/operator/parser/json/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ package json // import "github.com/open-telemetry/opentelemetry-collector-contri

import (
"context"
"encoding/json"
"fmt"

jsoniter "github.com/json-iterator/go"
Expand All @@ -16,7 +17,8 @@ import (
// Parser is an operator that parses JSON.
type Parser struct {
helper.ParserOperator
json jsoniter.API
json jsoniter.API
useNumber bool
}

// Process will parse an entry for JSON.
Expand All @@ -36,5 +38,47 @@ func (p *Parser) parse(value any) (any, error) {
default:
return nil, fmt.Errorf("type %T cannot be parsed as JSON", value)
}

if p.useNumber {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the conversion explicitly here makes me wonder what is the actual reason for defining the UseNumber setting in the json object's config at first place 🤔 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure either. Looking at the encoding/json it looks like not as these many options as json iterator, which makes more sense to me to only expose use_number option for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the point of enabling the UseNumber of the jsoniter.Config? Is this required for some reason? If so I suggest we document this at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33699/files#diff-158189e84f05b177451492225bc83c2c23fa140cffdd4c5f9bf7db6ada8edc3aR54.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment, please also check this test that when UseNumber is false, regardless the data is int or float, they will be parsed as float64. https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33699/files#diff-f811622a59595f3ded728a9213b5e08d0c5fbe91c4cfff23b25d037f84e1f953R142-R155

p.convertNumbers(parsedValue)
}
return parsedValue, nil
}

func (p *Parser) convertNumbers(parsedValue map[string]any) {
for k, v := range parsedValue {
switch t := v.(type) {
case json.Number:
parsedValue[k] = p.convertNumber(t)
case map[string]any:
p.convertNumbers(t)
case []any:
p.convertNumbersArray(t)
}
}
}

func (p *Parser) convertNumbersArray(arr []any) {
for i, v := range arr {
switch t := v.(type) {
case json.Number:
arr[i] = p.convertNumber(t)
case map[string]any:
p.convertNumbers(t)
case []any:
p.convertNumbersArray(t)
}
}
}

func (p *Parser) convertNumber(value json.Number) any {
i64, err := value.Int64()
if err == nil {
return i64
}
f64, err := value.Float64()
if err == nil {
return f64
}
return value.String()
}
96 changes: 96 additions & 0 deletions pkg/stanza/operator/parser/json/parser_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,102 @@ func TestParser(t *testing.T) {
ScopeName: "logger",
},
},
{
"use_number_simple",
func(p *Config) {
p.JsoniterConfig = &JsoniterConfig{UseNumber: true}
},
&entry.Entry{
Body: `{"int":1,"float":1.0}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
},
Body: `{"int":1,"float":1.0}`,
},
},
{
"use_number_nested",
func(p *Config) {
p.JsoniterConfig = &JsoniterConfig{UseNumber: true}
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0}}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"nested": map[string]any{
"int": int64(2),
"float": float64(2),
},
},
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0}}`,
},
},
{
"use_number_arrays",
func(p *Config) {
p.JsoniterConfig = &JsoniterConfig{UseNumber: true}
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0},"array":[1,2]}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"nested": map[string]any{
"int": int64(2),
"float": float64(2),
},
"array": []any{int64(1), int64(2)},
},
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0},"array":[1,2]}`,
},
},
{
"use_number_mixed_arrays",
func(p *Config) {
p.JsoniterConfig = &JsoniterConfig{UseNumber: true}
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"mixed_array":[1,1.5,2]}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"mixed_array": []any{int64(1), float64(1.5), int64(2)},
},
Body: `{"int":1,"float":1.0,"mixed_array":[1,1.5,2]}`,
},
},
{
"use_number_nested_arrays",
func(p *Config) {
p.JsoniterConfig = &JsoniterConfig{UseNumber: true}
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0,"array":[1,2]},"array":[3,4]}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"nested": map[string]any{
"int": int64(2),
"float": float64(2),
"array": []any{int64(1), int64(2)},
},
"array": []any{int64(3), int64(4)},
},
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0,"array":[1,2]},"array":[3,4]}`,
},
},
}

for _, tc := range cases {
Expand Down
4 changes: 4 additions & 0 deletions pkg/stanza/operator/parser/json/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,7 @@ timestamp:
parse_from: body.timestamp_field
layout_type: strptime
layout: '%Y-%m-%d'
use_number:
type: json_parser
jsoniter_config:
use_number: true
Loading