-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-21521 Expose master startup status via web UI #3667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@apurtell @saintstack Could you please help review this PR ? Screenshots are attached in jira. Thank you ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a MonitoredTask for tracking HMaster startup which is exposed in the "Tasks" header on the Master UI today.
What is your solution here doing differently than what we already have with MonitoredTask and the Tasks page on the Master UI?
If there's a gap in understanding/visibility, I think it'd be better to reuse what we already have than create something brand new. With this change, we would have two things which are "tracking" master startup.
*/ | ||
@InterfaceAudience.Private | ||
public class PhaseStatus { | ||
private String phaseName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a Phase
?
* a clone of the data. | ||
*/ | ||
@InterfaceAudience.Private | ||
public class MasterStartupProgress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have MonitoredTask and MonitoredTaskImpl which appear to do very similar things to what you're doing here.
Did you look at MonitoredTask and is there a reason we cannot use/extend it instead of writing net-new code?
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Thank you @joshelser for the feedback.
The current Tasks page is not very clear in the sense it doesn't give me all the different phases view. Once the startup finishes, all the tasks disappears. Also there are some limitations in MonitoredTask which doesn't allow me to track start time and end time for each state. Also it doesn't show the states that are completed. Since MonitoredTask is being used at many places, I didn't want to change just for HMaster startup use case. WDYT ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works, but it would be cleaner if somehow Phases can be MonitoredTasks. Take what we have already for task monitoring and add the ability to group them. Startup activity can be the first official grouping.
TaskGroup createTaskGroup(String name)
TaskGroup getTaskGroup(String name)
TaskGroup#addTask(MonitoredTask task)
Collection<MonitoredTask> TaskGroup#getTasks()
etc.
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package org.apache.hadoop.hbase.master.startupprogress; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems redundant with what MonitoredTask provides.
*/ | ||
@InterfaceAudience.Private | ||
public class PhaseTracking { | ||
private long startTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all redundant with what MonitoredTask provides.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
IMO, if start/end time is the only thing you're missing, then just add that to MonitoredTask and adapt your UI to use that class instead. I think that Andrew came to the same conclusion about duplicating MonitoredTask, the best idea is to try to reuse it :) The UI component is great and I agree that the "Tasks" page currently on the UI leaves something to be desired. I think your presentation of this data is excellent. |
5a43d2d
to
9aac82f
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Test passing on my local cluster:
|
🎊 +1 overall
This message was automatically generated. |
Attached 2 new screenshots in HBASE-21521. Attaching here also. This now uses the @joshelser @apurtell please review again. Thank you ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach a lot better! Good work.
One concern I have which this new approach inadvertently creates is that the Master will potentially OOME itself if it's waiting for RegionServers to come up for a long period of time. The journal
in MonitoredTaskImpl can grow unbounded. Thus, like you pointed out where we get multiple journal entries for the same "waiting for regionservers" update, we continue to hold those in memory for (best as I can tell) the lifetime of the HBase Master.
Every WAIT_ON_REGIONSERVERS_TIMEOUT seconds (which defaults to 4.5 seconds), it will create a new status with updated message. I like the way it tells the operator that how many RS it was waiting for to become online.
What do you think about one more extension to MonitoredTask/StatusJournal such that if we publish the same StatusJournalEntry multiple times, we drop the tail of that journal
before we add the new entry to the journal. From the POV of the user, they would see the latest "waiting on regionservers" log message with the time incrementing, but this UI page wouldn't grow unbounded.
List<Phase> phases = new ArrayList<>(); | ||
// Make a copy of the list to avoid ConcurrentModificationException. | ||
List<MonitoredTask.StatusJournalEntry> journalEntries = | ||
ImmutableList.copyOf(this.monitoredTask.getStatusJournal()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect the caller to be modifying this list (especially given that Phase
should also be immutable).
How about avoid the copy by using Collections.unmodifiableList()
?
if (elapsedTime == Long.MAX_VALUE) { | ||
return "-"; | ||
} | ||
long hours, minutes, seconds, milliseconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HBase follows https://www.oracle.com/java/technologies/javase/codeconventions-declarations.html which avoids multiple initializations on the same line. I believe we do not care if they are done in a block at the start of the method (each on their own line) or if you declare them as you use them the first time.
@joshelser It's tricky to find out which StatusJournalEntry are the same. How about we keep the last 500 status in memory and evict the older ones ? |
That's acceptable to start. I think this is not ideal as, if the system is stuck in this situation where we are waiting on regionservers (or similar), we will eventually lose all of the "good" status updates prior to this. That said, we should also make sure "perfect" isn't the enemy of "good" :) |
Any progress here? |
No description provided.