-
Notifications
You must be signed in to change notification settings - Fork 101
Batch inserts #110
Comments
The original version of the plugin did do exactly this (this was back in the logstash 1.4 era) The problem was that if one insert in the batch fails, the entire batch would fail and there was no way to figure out which of the batch failed. It turned out that a lot of people aren't capable of writing statements that won't fail, or they had inputs which werent always providing the data they expected. This resulted in a loss of events. Unless I'm mistaken there's no way to deal with this other than the way the plugin now functions? |
Yes, but that depends on the use case. On some use cases, it may be important to know which event caused the batch to fail. On other use cases, throughput may be more important. Or maybe errors are already ignored (with something like INSERT IGNORE). Shouldn't that decision be made by the user? Maybe the batch insert feature can be optional and disabled by default. Regarding dealing with batch failure and pinpointing the culprit, what about this approach: https://vladmihalcea.com/how-to-find-which-statement-failed-in-a-jdbc-batch-update/ |
Ah it's starting to come back to me now. I did start to implement something and I think I found some JDBC driver(s) I was testing with behaved differently when it came to how they worked with failures. Some stopped processing immediately with an error and some carried on. This made the retry logic a pain in the arse to deal with and I think I just decided to give up at that point as retrying was far more important at the time. In theory there's nothing wrong with implementing batch support as a switch/option. For support purposes I'd prefer to leave it as off by default based on my previous experience. It's unlikely I'll get around to implementing this any time soon as it's not something I need at work (our rate of events in the JDBC output is really low) and my personal time is limited at the moment. If you'd like to work on a patch I'll happily accept it, after review. Just let me know if you do you want to start on it. Otherwise this'll likely be at least a few weeks/months before I get around to looking at it. |
I see what you mean. For now, our use case is much slower than it should/could be (about 10M~20M records). But since the total time is still tolerable, we also won't start working on it immediately. If we do, we'll let you know. Thanks! |
No problem 👍 I'll keep the issue open, if you dont mind? That way I dont forget about this as a feature to look into in the future? |
Sure. 👍 |
Hello, just to let you know that I'm also inserting a lot of data in MS SQL. And I have the same issue. 👍 |
Need Batch inserts for sure +1 |
Pull requests are welcome. It’s unlikely I’m going to implement this any time soon still. I have no need for this feature at work and I cannot justify the time at home as I’m flat out on other things. |
is there any way we can increase pipeline worker thread size to achieve the same thing until the enhancement is done? |
Increase the number of logstash workers (and ensure you have the same +1 connection pool size) and it should increase the output as long as your sql server can keep up. Or at least that’s what ligstash used to do. I have not needed to tweak this myself and I have not kept on top of this. |
I had a need for batching using both safe and unsafe statements with retry support. Changes have been added to my fork at: https://github.com/ibspoof/logstash-output-jdbc/tree/add_batching if my implementation looks good I can file a MR. |
Edit: thank you for your work so far @ibspoof Looks good on paper, but I’ve not tried it out (I’m away from home and office today). Only thing that’s probably missing are some specs, but if you wish I can probably add some. If you want to have a go feel free. Whenever you’re ready file the MR, regardless I’ll try and find some time Monday to have a quick play and test |
I can add specs this weekend and submit. |
expect batch inserts ~~~ |
The main problem I see with the batch implementation @ibspoof is that you're always putting all the batch_events back on the retry list. I was going to merge it straight in until I noticed this :/ Depending on what errors you are either going to end up duplicating a bunch of events, or potentially not putting any in at all, if it's the first statement that fails. I believe you need to account for this by dealing with If you don't need/want to worry about the above, are you happy if I take your work and do something with it? |
Hi.
I noticed the plugin issues one insert statement per event.
Would it be possible for the plugin to implement batch inserts? It would increase insert throughput and reduce server load considerably for large workloads.
Maybe something like this: https://www.tutorialspoint.com/jdbc/jdbc-batch-processing.htm
The text was updated successfully, but these errors were encountered: