Skip to content

Conversation

@chrisberkhout
Copy link
Contributor

@chrisberkhout chrisberkhout commented Nov 19, 2025

Proposed commit message

[o365] Simplification of data fetching logic

Flattens the structure of the CEL program.

Simplifications:
- Make only one request per evaluation (or none).
- Expired items are skipped during fetching rather than filtering them
  out in multiple places.
- The non-canonical header name `NextPageUri` is no longer considered,
  as it's always normalized by the HTTP client.
- Assume that items can be fetched in the order they are listed.
- Assume that content items will not be empty.
- Update the `last_for` times once based on the listing range, rather
  than repeatedly (with the same value) for each followed listing link.
- Unify handling of generated listing links (for initial requests) and
  received listing links (for later pages).
- Subscribe once per input start (an alternative to once for the life of
  cursor data, as introduced in #15476).

Other changes:
- Moves some state into `state.cursor`, so that it persists across
  restarts: `state.work.todo_content` → `state.cursor.todo_content`,
  `state.work.next_list` (string) → `state.cursor.todo_links` (array).
- Renames `state.work.todo_type` (array) → `state.todo_types` (plural
  name, array). This stays out of the cursor because it can be
  reconstructed.
- Do all subscriptions first, then rotate types so everything is roughly
  chronological rather than type-by-type.
- Adds `state.subscribed` (map). It's not in the cursor data because we
  want to resubscribe if restarted.
- Keep querying until the time 3 seconds before the start has been
  reached (exclusive). The 3 second buffer avoids requesting times that
  may have unstable results.
- Log an error if no type is configured.
- The `max_executions` limit is raised. Getting up to date means
  hour-long listings for 168 hours of data, possibly over multiple pages
  each, likely for multiple content types, and fetching everything that
  was listed.

A mock server is added and used for system tests.

Notes for the reviewer

I started this before the last 4 PRs, and I checked that there are no conflicts with those changes:

The new CEL code passes with the original system tests.

The system tests have been updated to use a mock o365 server rather than the stream tool. The mock server has configurability, assertions and logging beyond what is used in the system test, which may be useful for future debugging. The amount of code is not insignificant, and the quality is just okay, so if this is a maintenance concern, it can be moved to a separate PR or removed entirely. Let me know what you think.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

You can manually run the mock server like this:

go run ./_dev/deploy/docker/o365mock.go chunks_with_gaps_and_1_expired

In another terminal, run the CEL code in mito like this:

mito \
  -cfg <(echo '
auth:
  oauth2:
    client.id: test-cel-client-id
    client.secret: test-cel-client-secret
    provider: azure
    scopes:
      - "https://manage.office.com/.default"
    endpoint_params:
      grant_type: ["client_credentials"]
    token_url: http://localhost:9999/test-cel-tenant-id/oauth2/v2.0/token
') \
   -data <(echo '
{
	"url": "http://localhost:9999",
	"want_more": false,
	"base": {
		"tenant_id": "test-cel-tenant-id",
		"list_contents_start_time": "15h",
		"batch_interval": "1h",
		"maximum_age": "167h55m",
		"content_types": "Audit.AzureActiveDirectory, Audit.Exchange"
	}
}
') \
  -log_requests \
  <(awk '/^program:/{iscel=1; next} /^\{\{/{iscel=0} iscel' ./data_stream/audit/agent/stream/cel.yml.hbs)

Stop the mock server with Ctrl+C to trigger it's shutdown report.

You can remove the mito auth configuration if you disable CheckAccessToken in the mock server's inline configuration.

This version of mito will run faster than the following one, which rate limits to 1 rps:

go install github.com/elastic/mito/cmd/mito@835128

Related issues

@chrisberkhout chrisberkhout self-assigned this Nov 19, 2025
@chrisberkhout chrisberkhout requested a review from a team as a code owner November 19, 2025 15:49
@chrisberkhout chrisberkhout added enhancement New feature or request Integration:o365 Microsoft Office 365 Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Nov 19, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@chrisberkhout chrisberkhout requested a review from efd6 November 20, 2025 15:59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we lean on repeated input loops more now, I think it may be worth doing the dropping of the retry events in the agent with a beat processor, rather than sending the event to be dropped by the ingest pipeline. An example of this is here corresponding to this alternative way of saying {"retry": true}.

@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Nov 21, 2025

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @chrisberkhout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:o365 Microsoft Office 365 Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

o365: flatten structure, one request per evaluation

3 participants