-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL Server HA/DR Availability Group queries #8379
SQL Server HA/DR Availability Group queries #8379
Conversation
Merging original master to forked master
@denzilribeiro could you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally please split the PR as it has HADR changes, a config change and Perfmon counter fixes.. Limit this one to HADR related, and then create a new PR for other stuff so is easily trackable.
e7b0eef
to
742d58a
Compare
That makes sense, pulled out the perfmon counters' commit from this PR. Now this only has HADR related changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, other than 2 checks requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avinash I am super sorry about this was my mistake. I had asked to add this check
IF SERVERPROPERTY('EngineEdition') NOT IN (2,3,4) OR SERVERPROPERTY('IsHadrEnabled')
But now am thinking if someone doesn't have HADR enabled, the log will be flooded with messages.
I think would be better to revert this way. My apologies for having you make the changes that I didn't think out well.
IF SERVERPROPERTY('EngineEdition') NOT IN (2,3,4)
... Error message
IF SERVERPROPERTY('IsHadrEnabled') ==1
.. Execute query
SELECT | ||
'sqlserver_hadr_avreplica_states' AS [measurement], | ||
REPLACE(@@SERVERNAME,'\',':') AS [sql_instance], | ||
convert(nvarchar(36), hars.replica_id) as replica_id, | ||
ar.replica_server_name, | ||
convert(nvarchar(36), hars.group_id) as group_id, | ||
ag.name AS group_name, | ||
ar.replica_metadata_id, | ||
availability_mode, | ||
availability_mode_desc, | ||
failover_mode, | ||
failover_mode_desc, | ||
session_timeout, | ||
primary_role_allow_connections, | ||
primary_role_allow_connections_desc, | ||
secondary_role_allow_connections, | ||
secondary_role_allow_connections_desc, | ||
seeding_mode, | ||
seeding_mode_desc, | ||
basic_features, | ||
is_distributed, | ||
is_local, | ||
role, | ||
role_desc, | ||
operational_state, | ||
operational_state_desc, | ||
connected_state, | ||
connected_state_desc, | ||
recovery_health, | ||
recovery_health_desc, | ||
synchronization_health, | ||
synchronization_health_desc, | ||
last_connect_error_number, | ||
last_connect_error_description, | ||
last_connect_error_timestamp | ||
from sys.dm_hadr_availability_replica_states AS hars | ||
inner join sys.availability_replicas AS ar on hars.replica_id = ar.replica_id | ||
inner join sys.availability_groups AS ag on ar.group_id = ag.group_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, very nice idea :) May I suggest to add info about AG state in this query (or maybe in a different query to keep consistent) ? :
SELECT
'sqlserver_hadr_avreplica_states' AS [measurement],
REPLACE(@@SERVERNAME,'\',':') AS [sql_instance],
convert(nvarchar(36), hars.replica_id) as replica_id,
ar.replica_server_name,
convert(nvarchar(36), hars.group_id) as group_id,
ag.name AS group_name,
ag.basic_features,
ag.is_distributed,
hags.synchronization_health_desc AS ag_synchronization_health_desc,
ar.replica_metadata_id,
ar.availability_mode ,
ar.availability_mode_desc,
ar.failover_mode,
ar.failover_mode_desc,
ar.session_timeout,
ar.primary_role_allow_connections,
ar.primary_role_allow_connections_desc,
ar.secondary_role_allow_connections,
ar.secondary_role_allow_connections_desc,
ar.seeding_mode,
ar.seeding_mode_desc,
hars.is_local,
hars.role,
hars.role_desc,
hars.operational_state,
hars.operational_state_desc,
hars.connected_state,
hars.connected_state_desc,
hars.recovery_health,
hars.recovery_health_desc,
hars.synchronization_health AS replica_synchronization_health,
hars.synchronization_health_desc AS replica_synchronization_health_desc,
hars.last_connect_error_number,
hars.last_connect_error_description,
hars.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states AS hars
inner join sys.availability_replicas AS ar on hars.replica_id = ar.replica_id
inner join sys.availability_groups AS ag on ar.group_id = ag.group_id
INNER JOIN sys.dm_hadr_availability_group_states AS hags ON hags.group_id = ag.group_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not for having a separate query, which means yet another one to run in an iteration. @avinash-nigam if adding group state you may have to name the columns appropriately for state :)
One more thing @avinash-nigam - can we rename the measurement to sqlserver_hadr_replica_states ( no need for the av part?)
Thanks much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedavem included AG state as per your and Denzil's suggestions
@denzilribeiro removed the "av" part from the measurement
@denzilribeiro made changes as per your suggestion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @Trovalo - can you also take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I got overwhelmed and forgot about it last week. Looks fine to me.
Thank you @denzilribeiro and @Trovalo for the review and approvals. |
DatabaseType string `toml:"database_type"` | ||
IncludeQuery []string `toml:"include_query"` | ||
ExcludeQuery []string `toml:"exclude_query"` | ||
IgnoreDefaultServer bool `toml:"ignore_default_server"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need an "ignore default server" flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ssoroka for taking a look at this. The "ignore default server" flag was added for the case when the server list is empty and no server should be monitored. Without the flag, an empty server list would cause the localhost to be monitored, which would throw errors when using the agent for remote monitoring. Also, the localhost mechanism is valid only for SQL Server database type, but not for Azure SQL Database or Azure SQL Managed Instance. By adding the flag, the previous experience would work as is, and whoever does not want to monitor localhost can use this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spot that. tbh, I don't get why someone should leave the list empty even for localhost, I know the plugin will look for the default instance (port 1433) on localhost but I've never used that anyway as I want all to be explicitly set.
I'm not against the change, I just find it odd, if you don't want to use the plugin you can simply comment it out, or if you use multiple .conf file and the --config-directory
you can simply change the file extension to something different than .conf and it will be ignored (I personally use this pattern input_sqlserver.conf.ignore
.
In any case, this looks out of scope for this PR, as it should just add the new queries, can you propose this in a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't how we normally do defaults anyway. the default should go in the init() function, and then you don't need conditional checks like
if len(s.Servers) == 0 {
s.Servers = append(s.Servers, defaultServer)
}
that assume empty lists are unset. Let's remove this boolean and move the default to the init() function, and then explicitly empty lists will work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssoroka - Could you please help merge this PR if there are no concerns?
if len(s.Servers) == 0 { | ||
s.Servers = append(s.Servers, defaultServer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to move to the init() function to be override-able. You won't need the conditional there, so just s.Servers = []string{defaultServer}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssoroka - That's an existing code, I can address it when I try to address the default server aspect in a separate PR. This PR is only for the HA/DR queries now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Please do make another PR for it, as it's currently busted, as you've noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do that. Thank you!
Required for all PRs:
Includes changes for