Conversation
Codecov Report
@@ Coverage Diff @@
## master #9011 +/- ##
=============================================
- Coverage 70.01% 26.45% -43.57%
+ Complexity 4959 1 -4958
=============================================
Files 1826 1818 -8
Lines 95975 95745 -230
Branches 14350 14328 -22
=============================================
- Hits 67197 25327 -41870
- Misses 24134 68012 +43878
+ Partials 4644 2406 -2238
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| for (TableConfig tableConfig : enabledTableConfigs) { | ||
| try { | ||
| TaskGeneratorMostRecentRunInfoUtils.saveErrorRunMessageToZk(_pinotHelixResourceManager.getPropertyStore(), | ||
| tableConfig.getTableName(), taskGenerator.getTaskType(), System.currentTimeMillis(), e.getMessage()); |
There was a problem hiding this comment.
We should get the whole stack trace, not just the message
This PR is for Generator side errors I believe, not for the task errors |
Hmm.. so we are using zk like a log? Why not just add a metric and set alerts on it? You can also pipe logs into log processors. We don't want to pollute ZK with information that can be got from other means. |
I think it's okay, given we're limiting it to just 5 recent tasks. Most minion related errors and confusion is seen by users when getting started. Given the popularity of trying to use the minion tasks for ingestion and rollups, we're seeing an influx of questions in the community slack, about minion related debugging. Most of the time, the users are asking if there's an API to quickly see the scheduling side and task execution side errors. Plus during the getting started phase, metrics reporting backend, alerting or log processors isn't usually setup. |
|
Closing this as the work is being picked up by @saurabhd336 in #9043 |
Changes
Save task generator info to ZK to help with task generator debug. Specifically
1/ the ZK path is
/MINION_TASK_GENERATOR_INFO/${tableNameWithType}/${taskType}. We don't reuse the/MINION_TASK_METADATA/${tableNameWithType}/${taskType}because we don't want to mess up the existing ZNodes.2/ save the most recent 5 success task generation run timestamp to ZK
3/ save the most recent 5 error task generation run timestamp and message to ZK
This logic change is not on the main data or control path, so it should be safe.