Skip to content

feat: Standardize operational env vars #31161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

benjackwhite
Copy link
Contributor

@benjackwhite benjackwhite commented Apr 12, 2025

Problem

We have multiple names in different services doing the same thing. I want to have one place in our infra to apply across all services and the first step is standardizing the naming

Changes

  • Remove duplicate env var
  • Make rust service use the same key style as the node ones and same envs (this will allow us to have a single source of truth in charts repo)
  • Make overflow actually write to overflow

Does this work well for both Cloud and self-hosted?

How did you test this code?

@benjackwhite benjackwhite marked this pull request as ready for review April 12, 2025 17:15
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR standardizes operational environment variables across PostHog services, focusing on event dropping and overflow handling configurations.

  • Removes duplicate DROP_EVENTS_BY_TOKEN variable in favor of the more flexible DROP_EVENTS_BY_TOKEN_DISTINCT_ID in plugin-server
  • Renames DROPPED_KEYS to DROP_EVENTS_BY_TOKEN_DISTINCT_ID in Python capture code for consistency
  • Renames overflow_forced_keys to ingestion_force_overflow_by_token_distinct_id and dropped_keys to drop_events_by_token_distinct_id in Rust capture service
  • Updates OverflowLimiter and TokenDropper implementations to handle the standardized format (comma-separated token:distinct_id pairs)
  • Adds dedicated kafka_overflow_topic configuration in Rust capture service for consistent overflow handling

12 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

benjackwhite and others added 2 commits April 12, 2025 13:17
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@benjackwhite benjackwhite requested a review from a team April 14, 2025 16:00
@eli-r-ph
Copy link
Contributor

👋 Looking at fixing the Rust wiring and pushing to this PR today 👍

…tinct_id or IP address. doesn't effect functionality of new token or token:distinct_id overflow routing
@eli-r-ph eli-r-ph marked this pull request as draft April 15, 2025 19:06
@eli-r-ph
Copy link
Contributor

👋 setting this back to draft status while we cut it up for parts and ship separately 👍 cc @nickbest-ph @benjackwhite @pl

@benjackwhite benjackwhite marked this pull request as ready for review April 16, 2025 07:45
@benjackwhite benjackwhite marked this pull request as draft April 16, 2025 07:45
@benjackwhite
Copy link
Contributor Author

@eli-r-ph im gonna close given you are sorting stuff out in smaller chunks. Feel free to open your own PR (just want it off my backlog :D )

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

(updates since last review)

Based on the provided files and previous review, I'll focus on the most recent changes and important updates not yet covered:

This PR enhances the overflow handling implementation in PostHog's ingestion system.

  • Adds proper overflow topic routing in Rust's KafkaSink, ensuring overflow events are sent to a dedicated topic instead of the main one
  • Updates overflow limiter logic to support both full event keys and token-only checks in forced_keys list
  • Changes TokenDropper format to use comma-separated pairs and adds support for token-only dropping without distinct IDs
  • Improves test coverage for overflow scenarios and edge cases in both Rust and Node.js services

The changes maintain backward compatibility while providing more robust overflow handling capabilities. The implementation looks solid with appropriate error handling and comprehensive test coverage.

13 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +27 to +29
if let Some(ids) = entry {
ids.push(id.to_string());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: If entry is None (meaning drop all), we still try to push to it. This is dead code that will never execute.

Suggested change
if let Some(ids) = entry {
ids.push(id.to_string());
}
if let Some(Some(ids)) = entry.as_mut() {
ids.push(id.to_string());
}

Comment on lines +15 to 20
for part in config.split(',') {
let mut parts = part.trim().split(':');
let Some(token) = parts.next() else {
warn!("No distinct id's configured for pair {}", pair);
continue;
};
let Some(ids) = parts.next() else {
warn!("No distinct id's configured for token {}", token);
warn!("Invalid format in part {}", part);
continue;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Check if part.trim() is empty before processing to avoid unnecessary warning logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants