Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data ingestion CPU efficiency improvements #13319

Open
lnbest0707-uber opened this issue Jun 5, 2024 · 1 comment
Open

Data ingestion CPU efficiency improvements #13319

lnbest0707-uber opened this issue Jun 5, 2024 · 1 comment

Comments

@lnbest0707-uber
Copy link
Contributor

lnbest0707-uber commented Jun 5, 2024

Pinot data ingestion from Kafka is following the 1 thread per Kafka partition mechanism. The scaling up is relying on increasing number of Kafka topic partitions. However, due to the nature of ingestion computation load, Kafka broker usually has a far higher traffic volume limit per partition than Pinot.
For example, with same type of hardware, Kafka could afford traffic over 8MB/s/partition but Pinot if doing complex transformation and index building (e.g. SchemaConformingTransformer & text index) can only afford <2 MB/s/partition. This makes the Kafka partition expansion not able to be always in sync with Pinot's system load.
In reality, we are observing that in a Pinot server with tens of cores, only 20% are busy with ingesting and others relatively idle.

Hence, there's requirement to improve the computation efficiency and do parallel (at least part of) single partition message processing.
image
From the attached pic, there are a few components to be improved:

  • gzip compression -> to zstd with proper level
  • transformers -> using batch and parallel processing, there are some other OSS projects like uForwarder doing the batch message processing
  • indexing -> TBD
  • Kafka polling -> batch polling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants