Skip to content

ADDON-77801 Added support for ingesting json event using conte… #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1402,6 +1402,10 @@ def hec_raw_replay(
url_with_hec_path = urllib.parse.urljoin(
url_with_port, "services/collector/raw"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do if else above with this, not override later?

Is there any test where we can test this behavior?

if attack_data_file.endpoint:
url_with_hec_path = urllib.parse.urljoin(url_with_port, attack_data_file.endpoint)

with open(tempfile, "rb") as datafile:
try:
res = requests.post(
Expand Down
1 change: 1 addition & 0 deletions contentctl/objects/test_attack_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ class TestAttackData(BaseModel):
sourcetype: str = Field(...)
custom_index: str | None = None
host: str | None = None
endpoint: str | None = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it makes sense to add param, so that it would need to be set externally. It is doable for sure, on the other hand maybe it is generic enough for contentctl to spot that event format is hec format and decide as @spanchal-crest was proposing initally?

Mean there is very specific json format, requiring "event" field and some others: https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector

On the other hand if assuming that having event and not having other fields than listed on above page seems to risky we can stick to the entry here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure that I am following your suggestion. Could you provide some pseudocode, by any chance?
Are you saying that, for example, the JSON data that we want to replay, today, is actually JSONL and is a single file that looks like the following:

{"field1": "somevalue", "field2": 4}
{"field1": "someothervalue", "field2": 2}
{"field1": "somethirdvalue", "field2": 1}

But must actually be converted to the following format before it is replayed:

{"sourcetype": "some_sourcetype", "event": {"field1": "somevalue", "field2": 4}}
{"sourcetype": "some_sourcetype", "event": {"field1": "someothervalue", "field2": 2}}
{"sourcetype": "some_sourcetype", "event": {"field1": "somethirdvalue", "field2": 1}}

In line with the requirements for the HEC JSON endpoint?

Secondly, instead of the typing
endpoint: str | None = None
I would suggest

DEFAULT_HEC_ENDPOINT = "services/collector/raw"
`endpoint: str  = DEFAULT_HEC_ENDPOINT`

or, even better
from enum import StrEnum
class ReplayEndpoint(StrEnum):
HEC = "services/collector/raw"
JSON = "services/collector/event"

endpoint: ReplayEndpoint = ReplayEndpoint.HEC


given that we already have thousands of tests and all of them use the raw event collector endpoint.  The typing this way:
1. Ensures that endpoint is never `None` - since it is required it MUST be defined and Test Object creation time (giving it a reasonable default value means we don't need to go update 2,000 existing ymls).
2. Makes it explicit after the object is constructed so that we don't need a check in the replay logic for whether or not the value is None.
3. Means that if someone typos the endpoint, or tries to replay to an endpoint that contentctl does not support, they get a warning at validation time.


A third option could even be to make ReplayEndpoint its own Abstract Class and make HEC and JSON child classes that MUST implement the function `transform` so that if "raw" data (which comes directly from a file) must be "changed" in some way, the logic can be implemented in each of the ReplayEndpoint classes rather than in the replay logic itself.
OR, the JSON event file could be in the `{"sourcetype": "xyz", "event":{...}}` format on-disk.

Loading