-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded input and output plugins #5638
Comments
Can you try specifying the plugin multiple times with the same consumer group and see if that gives you the performance you are expecting? |
No It does not help ,I think it needs to be something similar to logstash pipelines where one flow is completely logically separate from others. |
About how many messages are being processed with a single Telegraf and which |
from Influxdb points measurements, we are processing 1.6 million points/node/min. in total =~ 27million/min |
Could enable the |
PFA, this is for one cluster and taken after 2 pm [7am-2pm] is normally peak hours. The graph has 10s interval configured so rollup is 10 sec. I have 3 clusters currently other two are bit smaller but eventually if possible, I want to merge all configs in one multithreaded telegraf config , [I tried that initially but performs extremely poorly]
Agent configs
|
Can you show just a single Telegraf instance? I'm trying to get an idea of how badly it is performing with a single plugin. Also, can you add you kafka_consumer configuration (remove any sensitive info). |
|
I'm not able to tell from the image the actual rate, could you switch it to a table panel? What is your Telegraf version? |
I am using 1.8 telegraf |
In Telegraf 1.8, the speed of all inputs is limited by the speed of the output. This prevents you from reading faster than you can write. If you are running up against this limit then adding additional inputs wouldn't help. In 1.9 the shared input limit is removed but a new option Either way, the kafka_consumer plugin Telegraf will refuse to read faster than it can write, and this means we should also look at the performance of the output too. The easiest way to see if this helps is to write the metrics out to a Can you show your output configurations and also let's take a look at the |
@danielnelson I tripled the number of partitions in kafka and duplicated input plugin to match the partition count, it seems the input plugin is keeping up well but now output plugin seems to be running slow. I see millions of objects in memory and log message shows a million messages in buffer. |
A single Telegraf does send sequentially to InfluxDB, usually this isn't a problem since it is common to multiple Telegraf sending to the database, which usually gives plenty of concurrency. How many total Telegraf are sending to InfluxDB, is it just this one instance now? |
no, I have 20 nodes sending to the influx. how can make influx writes go concurrent on the same node? |
If you have 20 nodes then I would expect plenty of concurrency in aggregate, so it may be that adding more concurrency will be offset by longer request times. But to answer your question, in a single Telegraf process the only way is to define the output plugin multiple times, and then use the namepass/namedrop or tagpass/tagdrop options to split the data. It can be quite difficult to balance the outputs, but this is the only way currently to "shard" the data. You can see an example of this in the configuration docs (second example from the bottom). Telegraf flushes immediately after receiving a full Are you still using Telegraf 1.8? Normally this version shouldn't fill the |
I just moved to the latest telegraph just to see if a new one is better, since then I started seeing full buffer and high reads from kafka. But performance gain was quite significant, could be due to multiple partitions too, regarding tag pass and tagdrop, if I have multiple clusters and I had a telegraf config with multiple config rules with namepass/drop and tagpass/drop. But since earlier issues of parallel processing, I moved to multiple processes on one box and simplified the configs. My understanding is, all output plugins share the same queue need to process the same amount of events, now for k_c1, we get 1million points/min whereas for k_c2 we get only 100K/min total 1.1million. I have a Kafka input plugin reading from both topics and defined two output plugins, one for sending to cluster1 and second for sending to cluster 2. Now my question is,
Example Config
|
Each output plugin actually has it's own separate buffer, it will only contain the metrics which make it past the tagpass step. In the latest Telegraf, one output being down should not effect others, because we have removed the input blocking behavior as mentioned earlier. |
does it mean, if one output is down and its internal queue is full, input plugin will drop events? |
Speaking about Telegraf 1.9 and newer. When using most other input plugins, a down output could drop its metrics, but metrics going to another plugin would be unaffected because each output has its own metric buffer. The |
@danielnelson As per your suggestion, I did split the influxdb output into multiple using tag pass/ tag drop mechanism, It seems to be helping a bit but still dropping lot of points. I think I dont have proper distribution of metrics per queue yet. Is it possible to see per plugin queue usage. Eg , If I have 4 influxdb output plugin, metrics should show influxdb plugin1 x number of writes, queue size.. influxdb plugin2 ... and so on.. then I can balance it better. |
Unfortunately the metrics we keep in the |
@danielnelson I think your suggestion helped us get to the scale we need almost doing 30million+/min :), but we need the capability to adjust the queue size per plugin, should I open a feature request for this? |
Which queue size are we talking about? |
@rbkasat Glad things are working well overall, I'm going to close this issue but if you find you still need a change then just go ahead and open a feature request, and we can discuss further on it. |
Feature Request
Opening a feature request kicks off a discussion.
Proposal:
Kafka Input plugin should be multithreaded to support multiple partitions.
Current behavior:
performs very badly with a single process
Desired behavior:
Use case: [Why is this important (helps with prioritizing requests)]
The text was updated successfully, but these errors were encountered: