Skip to content

Commit 7c4751c

Browse files
author
naman-msft
committed
mass-testing docs now
1 parent 7d6fc55 commit 7c4751c

File tree

10 files changed

+136
-84
lines changed

10 files changed

+136
-84
lines changed

scenarios/AKSClientSecretError/aks-client-secret-error.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ The issue that generates this service principal alert usually occurs for one of
4040
Use the following commands to retrieve the service principal profile for your AKS cluster and check the expiration date of the service principal. Make sure to set the appropriate variables for your AKS resource group and cluster name.
4141

4242
```azurecli
43-
SP_ID=$(az aks show --resource-group RESOURCE_GROUP_NAME \
44-
--name AKS_CLUSTER_NAME \
43+
SP_ID=$(az aks show --resource-group $RESOURCE_GROUP_NAME \
44+
--name $AKS_CLUSTER_NAME \
4545
--query servicePrincipalProfile.clientId \
4646
--output tsv)
4747
az ad app credential list --id "$SP_ID"

scenarios/AKSHealthProbeMode/aks-health-probe-mode.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ To troubleshoot these issues, follow these steps:
4242
```azurecli
4343
export RESOURCE_GROUP="aks-rg"
4444
export AKS_CLUSTER_NAME="aks-cluster"
45-
az aks show --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --query "loadBalancerProfile"
45+
az aks show --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --query "networkProfile.loadBalancerProfile"
4646
```
4747
Results:
4848
@@ -66,24 +66,27 @@ To troubleshoot these issues, follow these steps:
6666
}
6767
```
6868
69-
2. Check the *overlaymgr* log to see if the cloud provider secret is updated. The keyword to look for is `cloudConfigSecretResolver`. Or check the contents of the cloud-provider-config secret in the `ccp` namespace. You can use the `kubectl get secret` command to view the secret.
69+
2. Check the cloud provider configuration. In modern AKS clusters, the cloud provider configuration is managed internally and the `ccp` namespace doesn't exist. Instead, check for cloud provider related resources and verify the cloud-node-manager pods are running properly:
7070
71-
```shell
72-
kubectl get secret cloud-provider-config -n ccp -o yaml
71+
72+
```bash
73+
# Check for cloud provider related ConfigMaps in kube-system
74+
kubectl get configmap -n kube-system | grep -i azure
75+
76+
# Check if cloud-node-manager pods are running (indicates cloud provider integration is working)
77+
kubectl get pods -n kube-system | grep cloud-node-manager
78+
79+
# Check the azure-ip-masq-agent-config if it exists
80+
kubectl get configmap azure-ip-masq-agent-config-reconciled -n kube-system -o yaml 2>/dev/null || echo "ConfigMap not found"
7381
```
7482
Results:
7583
7684
<!-- expected_similarity=0.3 -->
7785
7886
```output
79-
apiVersion: v1
80-
data:
81-
cloud-config: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
82-
kind: Secret
83-
metadata:
84-
name: cloud-provider-config
85-
namespace: ccp
86-
...
87+
configmap/azure-ip-masq-agent-config-reconciled 1 11h
88+
89+
cloud-node-manager-rfb2w 2/2 Running 0 16m
8790
```
8891
8992
3. Check the chart or overlay daemonset cloud-node-manager to see if the health-probe-proxy sidecar container is enabled. You can use the `kubectl get ds` command to view the daemonset.

scenarios/AKSPreviewAPILifecycle/aks-preview-api-lifecycle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ API version as deprecation approaches.
3030
If you're unsure what client or tool is using this API version, check the [activity logs](/azure/azure-monitor/essentials/activity-log)
3131
using the following command:
3232

33-
Set the API version you want to inspect for recent usage in the activity log.
33+
Set the API version you want to inspect for recent usage in the activity log. In this example, we are checking for the `2022-04-01-preview` API version.
3434

3535
```bash
3636
export API_VERSION="2022-04-01-preview"

scenarios/AzureCNIPodSubnet/azure-cni-pod-subnet.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ Azure CNI Pod Subnet assigns IP addresses to pods from a separate subnet from yo
2323
- Azure CLI version `2.37.0` or later and the `aks-preview` extension version `2.0.0b2` or later.
2424
- Register the subscription-level feature flag for your subscription: 'Microsoft.ContainerService/AzureVnetScalePreview'.
2525

26+
## Enable Container Insights (AKS monitoring)
27+
2628
If you have an existing cluster, you can enable Container Insights (AKS monitoring) using the following command **only if your cluster was created with monitoring enabled or is associated with a valid Log Analytics Workspace in the same region**. Otherwise, refer to Microsoft Docs for additional workspace setup requirements.
2729

2830
```azurecli-interactive

scenarios/CSEErrorsAKS/cse-errors-aks.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,8 @@ Set up your custom Domain Name System (DNS) server so that it can do name resolu
107107
> **Important:** You must specify the `--name` of a valid VM in an availability set in your resource group. Here is a template for running network checks.
108108
109109
```azurecli
110-
export DNS_IP_ADDRESS="10.0.0.10"
110+
export API_FQDN=$(az aks show --resource-group $RG_NAME --name $CLUSTER_NAME --query fqdn -o tsv)
111+
111112
az vm run-command invoke \
112113
--resource-group $NODE_RESOURCE_GROUP \
113114
--name $AVAILABILITY_SET_VM \
@@ -121,7 +122,7 @@ Set up your custom Domain Name System (DNS) server so that it can do name resolu
121122
--command-id RunShellScript \
122123
--output tsv \
123124
--query "value[0].message" \
124-
--scripts "nslookup <api-fqdn> $DNS_IP_ADDRESS"
125+
--scripts "nslookup $API_FQDN $DNS_IP_ADDRESS"
125126
```
126127
127128
For more information, see [Name resolution for resources in Azure virtual networks](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances) and [Hub and spoke with custom DNS](/azure/aks/private-clusters#hub-and-spoke-with-custom-dns).

scenarios/CniDownloadFailureAKS/cni-download-failure-aks.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ This article discusses how to identify and resolve the `CniDownloadTimeoutVMExte
1818
## Prerequisites
1919

2020
- The [Curl](https://curl.se/download.html) command-line tool
21+
- Network access from the same environment where AKS nodes will be deployed (same VNet, firewall rules, etc.)
2122

2223
## Symptoms
2324

@@ -52,7 +53,7 @@ Run a Curl command to verify that your nodes can download the binaries:
5253
First, attempt a test download of the Azure CNI package for Linux from the official mirror endpoint.
5354

5455
```bash
55-
curl https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz
56+
curl -I https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz
5657
```
5758

5859
Results:
@@ -72,33 +73,53 @@ accept-ranges: bytes
7273
date: Thu, 05 Jun 2025 00:00:00 GMT
7374
```
7475

75-
This command will display binary archive data in the terminal if the download succeeds.
76+
This command checks if the endpoint is reachable and returns the HTTP headers. If you see a `200 OK` response, it indicates that the endpoint is accessible.
7677

7778
Next, attempt a download with validation and save the file locally for further troubleshooting. This will help determine if SSL or outbound connectivity is correctly configured.
7879

7980
```bash
80-
curl --fail --ssl https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz --output /opt/cni/downloads/azure-vnet-cni-linux-amd64-v1.0.25.tgz
81+
# Create a temporary directory for testing
82+
mkdir -p /tmp/cni-test
83+
84+
# Download the CNI package to the temp directory
85+
curl -L --fail https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz --output /tmp/cni-test/azure-vnet-cni-linux-amd64-v1.0.25.tgz && echo "Download successful" || echo "Download failed"
8186
```
8287

8388
Results:
8489

8590
<!-- expected_similarity=0.3 -->
8691

8792
```output
93+
% Total % Received % Xferd Average Speed Time Time Time Current
94+
Dload Upload Total Spent Left Speed
95+
100 6495k 100 6495k 0 0 8234k 0 --:--:-- --:--:-- --:--:-- 8230k
96+
Download successful
8897
```
8998

90-
Or as an alternative with output to /tmp and success message for clarity:
99+
Verify the downloaded file:
91100

92101
```bash
93-
curl -L --fail https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz --output /tmp/azure-vnet-cni-test.tgz && echo "Download successful" || echo "Download failed"
102+
ls -la /tmp/cni-test/
103+
file /tmp/cni-test/azure-vnet-cni-linux-amd64-v1.0.25.tgz
94104
```
95105

96106
Results:
97107

98108
<!-- expected_similarity=0.3 -->
99109

100110
```output
101-
Download successful
111+
total 6500
112+
drwxr-xr-x 2 user user 4096 Jun 20 10:30 .
113+
drwxrwxrwt 8 root root 4096 Jun 20 10:30 ..
114+
-rw-r--r-- 1 user user 6651392 Jun 20 10:30 azure-vnet-cni-linux-amd64-v1.0.25.tgz
115+
116+
/tmp/cni-test/azure-vnet-cni-linux-amd64-v1.0.25.tgz: gzip compressed data, from Unix, original size modulo 2^32 20070400
117+
```
118+
119+
Clean up the test files:
120+
121+
```bash
122+
rm -rf /tmp/cni-test/
102123
```
103124

104125
If you can't download these files, make sure that traffic is allowed to the downloading endpoint. For more information, see [Azure Global required FQDN/application rules](/azure/aks/outbound-rules-control-egress#azure-global-required-fqdn--application-rules).

scenarios/ForbiddenErrorAKS/forbidden-error-aks.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,28 @@ az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME --query aadProfile.enableAzureRb
4444

4545
Results:
4646

47-
<!-- expected_similarity=0.3 -->
4847
```output
4948
false
5049
```
5150

51+
- If the result is **null** or empty, the cluster doesn't have Azure AD integration enabled. See [Solving permission issues in local Kubernetes RBAC clusters](#solving-permissions-issues-in-local-kubernetes-rbac-clusters).
5252
- If the result is **false**, the cluster uses Kubernetes RBAC. See [Solving permission issues in Kubernetes RBAC-based AKS clusters](#solving-permissions-issues-in-kubernetes-rbac-based-aks-clusters).
5353
- If the result is **true**, the cluster uses Azure RBAC. See [Solving permission issues in Azure RBAC-based AKS clusters](#solving-permissions-issues-in-azure-rbac-based-aks-clusters).
5454

55+
### Solving permissions issues in local Kubernetes RBAC clusters
56+
57+
If your cluster doesn't have Azure AD integration (result was null), it uses cluster admin credentials:
58+
59+
```bash
60+
# Get admin credentials for full access
61+
az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME --admin
62+
63+
# Verify access
64+
kubectl get nodes
65+
```
66+
67+
**Warning**: Admin credentials provide full cluster access. Use carefully and consider enabling Azure AD integration for better security.
68+
5569
### Solving permissions issues in Kubernetes RBAC-based AKS clusters
5670

5771
If the cluster uses Kubernetes RBAC, permissions for the user account are configured through the creation of RoleBinding or ClusterRoleBinding Kubernetes resources. For more information, see [Kubernetes RBAC documentation](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
@@ -74,7 +88,6 @@ You can create a custom RoleBinding or ClusterRoleBinding resource to grant the
7488

7589
Results:
7690

77-
<!-- expected_similarity=0.3 -->
7891
```output
7992
[
8093
"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

scenarios/KubeletIOTroubleshooting/kubelet-io-troubleshooting.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,17 @@ keywords:
1616

1717
TCP timeouts can be caused by blockages of internal traffic that runs between nodes. To investigate TCP time-outs, verify that this traffic isn't being blocked, for example, by [network security groups](/azure/aks/concepts-security#azure-network-security-groups) (NSGs) on the subnet for your cluster nodes.
1818

19+
## Connect to the cluster
20+
21+
First, connect to your Azure Kubernetes Service (AKS) cluster by running the following command:
22+
23+
```bash
24+
export RESOURCE_GROUP=<your-resource-group>
25+
export CLUSTER_NAME=<your-cluster-name>
26+
27+
az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
28+
```
29+
1930
## Symptoms
2031

2132
Tunnel functionalities, such as `kubectl logs` and code execution, work only for pods that are hosted on nodes on which tunnel service pods are deployed. Pods on other nodes that have no tunnel service pods cannot reach to the tunnel. When viewing the logs of these pods, you receive the following error message:

scenarios/NodeNotReadyAKS/node-not-ready-aks.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -51,12 +51,31 @@ The [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/
5151
Examine the output of the `kubectl describe nodes` command to find the [Conditions](https://kubernetes.io/docs/reference/node/node-status/#condition) field and the [Capacity and Allocatable](https://kubernetes.io/docs/reference/node/node-status/#capacity) blocks. Do the content of these fields appear as expected? (For example, in the **Conditions** field, does the `message` property contain the "kubelet is posting ready status" string?) In this case, if you have direct Secure Shell (SSH) access to the node, check the recent events to understand the error. Look within the */var/log/syslog* file instead of */var/log/messages* (not available on all distributions). Or, generate the kubelet and container daemon log files by running the following shell commands:
5252

5353
```bash
54-
# To check syslog file (useful on Ubuntu-based AKS nodes),
55-
cat /var/log/syslog
56-
57-
# To check kubelet and containerd daemon logs,
58-
journalctl -u kubelet > kubelet.log
59-
journalctl -u containerd > containerd.log
54+
# First, identify the NotReady node
55+
export NODE_NAME=$(kubectl get nodes --no-headers | grep NotReady | awk '{print $1}' | head -1)
56+
57+
if [ -z "$NODE_NAME" ]; then
58+
echo "No NotReady nodes found"
59+
kubectl get nodes
60+
else
61+
echo "Found NotReady node: $NODE_NAME"
62+
63+
# Use kubectl debug to access the node
64+
kubectl debug node/$NODE_NAME -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 -- chroot /host bash -c "
65+
echo '=== Checking syslog ==='
66+
if [ -f /var/log/syslog ]; then
67+
tail -100 /var/log/syslog
68+
else
69+
echo 'syslog not found'
70+
fi
71+
72+
echo '=== Checking kubelet logs ==='
73+
journalctl -u kubelet --no-pager | tail -100
74+
75+
echo '=== Checking containerd logs ==='
76+
journalctl -u containerd --no-pager | tail -100
77+
"
78+
fi
6079
```
6180

6281
After you run these commands, examine the syslog and daemon log files for more information about the error.

0 commit comments

Comments
 (0)