[SPARK-4537][Streaming] Expand StreamingSource to add more metrics#3466
[SPARK-4537][Streaming] Expand StreamingSource to add more metrics#3466jerryshao wants to merge 5 commits intoapache:masterfrom
Conversation
|
Test build #23868 has started for PR 3466 at commit
|
|
Test build #23868 has finished for PR 3466 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Will this counter work? I think that gauges are collected only on request of a source, so if nobody is consuming the metric or consuming it too often, we will have a wrong count.
There was a problem hiding this comment.
Aha, you're right, so it is hard to collect the total batch records without modifying the StreamingJobProgressListener code.
There was a problem hiding this comment.
I will fix this, thanks a lot.
|
Test build #23913 has started for PR 3466 at commit
|
|
Test build #23913 has finished for PR 3466 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Its better to name this "processingTime"
|
Thanks a lot TD for your comments, I will factor out the old code and fix the above issues you mentioned. |
|
Test build #24751 has started for PR 3466 at commit
|
|
Hey TD, I've addressed the problem you mentioned in this way, I'm not is this what you want, would you mind taking a look at it ? Thanks a lot. |
There was a problem hiding this comment.
I think its better to keep the Option here (and document that defaultValue is used when f returns null. And other places should not have to use Option. This is safer for any one to use and also minimizes the changes.
There was a problem hiding this comment.
OK, I will revert it back and try a better way.
There was a problem hiding this comment.
Hi TD, what's your meaning of "And other places should not have to use Option", If here as an example, change to
registerGauge("lastCompletedBatch_submissionTime",
_.lastCompletedBatch.map(_.submissionTime).get, -1L)get will throw exception when there's no completed batch. I'm not sure what's actual meaning, sorry if I misunderstand anything.
There was a problem hiding this comment.
Hi TD, sorry to bother you again, I'm not if there's a better way to address this problem, would you mind giving me some hints, thanks a lot.
There was a problem hiding this comment.
I understand the problem. Good catch, I did not realize that. How about this. Lets make two versions of registerGauge, one that takes f: StreamingProgressListener => T without any default value, another that takes f: StreamingProgressListener => Option[T] and the default value. Each version will be used accordingly.
There was a problem hiding this comment.
OK, got it, I will change the code as you suggested.
|
Test build #24751 has finished for PR 3466 at commit
|
|
Test PASSed. |
|
Test build #24808 has started for PR 3466 at commit
|
|
Test build #24808 has finished for PR 3466 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
nit: The existing version of registerGauge could have used the new version. Not a big deal, very small amount of duplicate code.
|
Just a couple of more comments for making the name more consistent with existing ones. Otherwise I approve of the how the |
|
Test build #24827 has started for PR 3466 at commit
|
|
Hi TD, thanks a lot for your comments, I just change the code style as you suggested, also add one more metrics |
|
LGTM. Merging this. Thanks @jerryshao |
Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting. Author: jerryshao <saisai.shao@intel.com> Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits: 00f5f7f [jerryshao] Change the code style and add totalProcessedRecords 44721a6 [jerryshao] Further address the comments c097ddc [jerryshao] Address the comments 02dd44f [jerryshao] Fix the addressed comments c7a9376 [jerryshao] Expand StreamingSource to add more metrics (cherry picked from commit f205fe4) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting. Author: jerryshao <saisai.shao@intel.com> Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits: 00f5f7f [jerryshao] Change the code style and add totalProcessedRecords 44721a6 [jerryshao] Further address the comments c097ddc [jerryshao] Address the comments 02dd44f [jerryshao] Fix the addressed comments c7a9376 [jerryshao] Expand StreamingSource to add more metrics (cherry picked from commit f205fe4) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
Test build #24827 has finished for PR 3466 at commit
|
|
Test PASSed. |
Add
processingDelay,schedulingDelayandtotalDelayfor the last completed batch. AddlastReceivedBatchRecordsandtotalReceivedBatchRecordsto the received records counting.