Skip to content

Commit

Permalink
update diagnosis scripts to support AKS RP (#231)
Browse files Browse the repository at this point in the history
update diagnosis scripts to support AKS RP, improve documentation
  • Loading branch information
songjiaxun authored Jun 18, 2021
1 parent 01cbca4 commit 737ae20
Show file tree
Hide file tree
Showing 13 changed files with 240 additions and 494 deletions.
60 changes: 37 additions & 23 deletions diagnosis/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
# Troubleshooting AKS Engine on Azure Stack
# Troubleshooting AKS Cluster Issues on Azure Stack

This short [guide](https://github.com/Azure/aks-engine/blob/master/docs/howto/troubleshooting.md) from Azure's AKS Engine team has a good high level explanation of how AKS Engine interacts with the Azure Resource Manager (ARM) and lists common reasons that can cause AKS Engine commands to fail. That guide applies to Azure Stack as well as it ships with its own ARM instance. If you are facing a problem that is not part of this guide, then you will need extra information to figure out the root cause.
## Introduction
In order to troubleshoot some AKS cluster issues, you may need to collect logs directly from the cluster nodes. Typically, without this script, you would need to connect to each node in the cluster, locate and download the logs manually.

Typically, to collect logs from servers you manage, you have to start a remote session using SSH and browse for relevant log files. The scripts in this directory are aim to simplify the collection of relevant logs from your Kubernetes cluster. Just download/unzip the latest [release](https://github.com/msazurestackworkloads/azurestack-gallery/releases/tag/diagnosis-v0.1.2) and execute script `getkuberneteslogs.sh`.
The scripts in this directory aim to simplify the collection of relevant logs from your Kubernetes cluster. The script will automatically create a snapshot of the cluster, and connect to each node to collect logs. In addition, the script can, optionally, upload the collected logs to a storage account.

> Before you execute `getkuberneteslogs.sh`, make sure that you can login to your Azure Stack instance using `Azure CLI`. Follow this [article](https://docs.microsoft.com/azure-stack/user/azure-stack-version-profiles-azurecli2) to learn how to configure Azure CLI to manage your Azure Stack cloud.
This tool is mainly designed for the Microsoft support team to collect comprehensive cluster logs. For self-diagnosis purposes, please see [`az aks kollect`](https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az_aks_kollect) command and [aks-periscope](https://github.com/Azure/aks-periscope) application.

The logs retrieved by `getkuberneteslogs.sh` are the following:
## Requirments
- A machine that has access to your Kubernetes cluster, or the same machine you used to deploy your cluster. For Windows machine, install [Git Bash](https://gitforwindows.org/) in order to run bash scripts.
- `Azure CLI` installed on the machine where the script will be run. Make sure that you can login to your Azure Stack environment using `Azure CLI` from the machine. Follow this [article](https://docs.microsoft.com/azure-stack/user/azure-stack-version-profiles-azurecli2) to learn how to install and configure Azure CLI to manage your Azure Stack cloud.
- Switch to the subscription where the Kubernetes cluster is deployed, by using `az account set --subscription <Subscription ID>`.
- Download the latest [release](https://github.com/msazurestackworkloads/azurestack-gallery/releases) of the script into your machine and extract the scripts.

## Logs
This script automates the process of gathering the following logs:

- Log files in directory `/var/log/azure/`
- Log files in directory `/var/log/kubeaudit` (kube audit logs)
Expand All @@ -18,8 +26,10 @@ The logs retrieved by `getkuberneteslogs.sh` are the following:
- kubelet status and journal
- etcd status and journal
- docker status and journal
- containerd status and journal
- kube-system snapshot
- Azure CNI config files
- kubelet config files

Some additional logs are retrieved for Windows nodes:

Expand All @@ -31,21 +41,25 @@ Some additional logs are retrieved for Windows nodes:
- ETW events for Hyper-V
- Azure CNI config files

## Required Parameters

`-u, --user` - The administrator username for the cluster VMs

`-i, --identity-file` - RSA private key tied to the public key used to create the Kubernetes cluster (usually named 'id_rsa')

`-g, --resource-group` - Kubernetes cluster resource group

## Optional Parameters

`--disable-host-key-checking` - Sets SSH's `StrictHostKeyChecking` option to `no` while the script executes. Only use in a safe environment.

`--upload-logs` - Persists retrieved logs in an Azure Stack storage account. Logs can be found in `KubernetesLogs` resource group.

`--api-model` - Persists apimodel.json file in an Azure Stack Storage account.
Upload apimodel.json file to storage account happens when `--upload-logs` parameter is also provided.

`-h, --help` - Print script usage
## Parameters
| Parameter | Description | Required | Example |
|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------------------------------------------|
| -h, --help | Print command usage. | no | |
| -u,--user | The administrator username for the cluster VMs. | yes | azureuser (default value) |
| -i, --identity-file | SA private key tied to the public key used to create the Kubernetes cluster (sometimes named 'id_rsa'). | yes | /rsa.pem (Putty)<br>~/.ssh/id_rsa (SSH) |
| -g, --resource-group | Kubernetes cluster resource group. For the clusters created by AKS Service, the managed resource group name follows pattern 'MC_RESOURCEGROUP_CLUSTERNAME_LOCTION'. | yes | k8sresourcegroup<br>MC_AKSRP_k8scluster1_redmond |
| -n, --user-namespace | Collect logs from containers in the specified namespaces. If not sepcified, logs from ALL namespaces are collected. | no | monitoring |
| --upload-logs | Persists retrieved logs in an Azure Stack Hub storage account. Logs can be found in KubernetesLogs resource group. | no | |
| --api-model | Persists apimodel.json file in an Azure Stack Hub Storage account. Upload apimodel.json file to storage account happens when --upload-logs parameter is also provided. | no | ./apimodel.json |
| --disable-host-key-checking | Sets SSH's StrictHostKeyChecking option to "no" while the script executes. Only use in a safe environment. | no | |

## Examples
```bash
az account set --subscription <Subscription ID>
# cd to the directory where the scripts are in.
./getkuberneteslogs.sh -u azureuser -i private.key.1.pem -g k8s-rg
./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg --disable-host-key-checking
./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg -n default -n monitoring
./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg --upload-logs --api-model clusterDefinition.json
./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg --upload-logs
```
9 changes: 6 additions & 3 deletions diagnosis/azs-collect-windows-logs.ps1
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
$ProgressPreference = "SilentlyContinue"

$lockedFiles = "kubelet.err.log", "kubelet.log", "kubeproxy.log", "kubeproxy.err.log", "azure-vnet-telemetry.log", "azure-vnet.log", "network-interfaces.json", "interfaces.json"
$lockedFiles = "kubelet.err.log", "kubelet.log", "kubeproxy.log", "kubeproxy.err.log", "azure-vnet-telemetry.log", "azure-vnet.log", "network-interfaces.json", "interfaces.json", "azure-vnet-ipam.log", "windowsnodereset.log", "csi-proxy.log", "csi-proxy.err.log"

$timeStamp = get-date -format 'yyyyMMdd-hhmmss'
$zipName = "win_log_$env:computername.zip"
Expand Down Expand Up @@ -56,7 +56,10 @@ if (-not (Test-Path 'c:\k\debug\collectlogs.ps1')) {
& 'c:\k\debug\collectlogs.ps1' | write-Host
$netLogs = Get-ChildItem (Get-ChildItem -Path c:\k\debug -Directory | Sort-Object LastWriteTime -Descending | Select-Object -First 1).FullName | Select-Object -ExpandProperty FullName
$paths += $netLogs
$paths += "c:\AzureData\CustomDataSetupScript.log"
$setupLog = "c:\AzureData\CustomDataSetupScript.log"
if (Test-Path $setupLog) {
$paths += $setupLog
}

Write-Host "Collecting containerd hyperv logs"
if ((Test-Path "$Env:ProgramFiles\containerd\diag.ps1") -And (Test-Path "$Env:ProgramFiles\containerd\ContainerPlatform.wprp")) {
Expand All @@ -75,5 +78,5 @@ else {
Write-Host "Compressing all logs to $zipName"
$paths | Format-Table FullName, Length -AutoSize
Compress-Archive -LiteralPath $paths -DestinationPath $zipName
Remove-Item -Path $paths
Remove-Item -Path $paths -ErrorAction SilentlyContinue
Get-ChildItem $zipName # this puts a FileInfo on the pipeline so that another script can get it on the pipeline
Loading

0 comments on commit 737ae20

Please sign in to comment.