Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add operations scripts for maintaining infrastructure #4103

Merged
merged 76 commits into from
Nov 12, 2020
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
76390cc
reconcile lock file after removing api-report
awentzel Sep 15, 2020
ac8c01f
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Sep 15, 2020
b50b5bf
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Sep 16, 2020
0bf11be
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Sep 18, 2020
adf4c92
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Sep 28, 2020
8c0d4a8
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Oct 9, 2020
48f2cef
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Oct 19, 2020
6ebddfb
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Oct 20, 2020
88409c5
Merge branch 'master' of https://github.com/microsoft/fast-dna
awentzel Nov 4, 2020
f441504
add initial bash scripts for azure cli
awentzel May 20, 2020
ab0b706
add slot for apps
awentzel May 20, 2020
83b7a10
updates to front door
awentzel May 20, 2020
07c987a
add server.js file to use in production
awentzel May 21, 2020
26c1c97
update for wip
awentzel May 21, 2020
df68a8c
update todo list for web apps
awentzel May 22, 2020
a257c28
updates to front door documentation
awentzel May 23, 2020
c14fd55
update to rename front door
awentzel May 23, 2020
425b48b
add new slot for last-known-good for use in production
awentzel May 23, 2020
c56671b
update to cdn and storage
awentzel May 23, 2020
8efa4e3
update to include issues and todo's for later
awentzel May 26, 2020
3bf24d5
udpate to todos
awentzel May 26, 2020
d97189b
update CDN documentation
awentzel May 27, 2020
6cf609d
update to documentation
awentzel Jun 3, 2020
decac9f
add security testing for apps
awentzel Jun 19, 2020
ab99e3b
updated automated network access restrictions
awentzel Jun 22, 2020
0904f23
update to documentation
awentzel Jun 24, 2020
3fcc618
update configuration after configuring slots for security
awentzel Jun 26, 2020
98a3332
update notes for front door limitations
awentzel Jun 26, 2020
4a33091
update dns notes
awentzel Jun 26, 2020
da6ba2b
add deployment process documentation
awentzel Jun 27, 2020
ae2392a
update to disable ftps state and add network restriction source
awentzel Jun 27, 2020
c116881
add deploy script
awentzel Jun 27, 2020
70152cc
update to show currently logged in subscription
awentzel Jun 29, 2020
b838352
update notes for CDN w/CORS policy
awentzel Jun 29, 2020
4c66064
update notes on FD and AAD
awentzel Jul 1, 2020
f78c9a6
update for deploying
awentzel Jul 16, 2020
e573818
adds todo notes for better quality
awentzel Jul 29, 2020
e2da6d0
add network exceptions for testing
awentzel Aug 5, 2020
15b08f2
add removal of access restrictions
awentzel Aug 5, 2020
3a088b3
updated to include deployment scenarios in loops
awentzel Aug 5, 2020
f6773e5
update to terminal processing and styles
awentzel Aug 5, 2020
4f0ae31
update to finalize deployment script for any number of apps
awentzel Aug 5, 2020
72f8d83
update to finalize styling
awentzel Aug 6, 2020
c3c1d41
update to enable swapping
awentzel Aug 6, 2020
1cc82e5
update to add todo item
awentzel Aug 6, 2020
2d1a188
update comments for next todo
awentzel Aug 6, 2020
24c32dd
add pr template
awentzel Aug 6, 2020
83c3279
update to config files
awentzel Aug 11, 2020
12e975c
add logic to pick between environments
awentzel Aug 11, 2020
8d0657a
update to scripts for testing
awentzel Aug 12, 2020
3c8ef0b
update to fix
awentzel Aug 12, 2020
888f822
update to include environment selection process
awentzel Aug 13, 2020
90b3b41
update to refactor for supporting single service actions
awentzel Sep 2, 2020
ead3f68
update to include purging with additional refinements
awentzel Sep 10, 2020
9325db2
update to fix bug in swapping
awentzel Sep 11, 2020
62239ef
updated documentation
awentzel Sep 29, 2020
55e51b7
create functions file with functions for code reuse
awentzel Sep 29, 2020
35ae912
update to create resource group based on cli instructions
awentzel Oct 1, 2020
0d51229
update to add delete of all services in resource group
awentzel Oct 1, 2020
55e8aff
add restoring snapshots ability
awentzel Oct 1, 2020
7952b19
update ops region docs
awentzel Oct 1, 2020
fd5572f
update to initilizer
awentzel Oct 1, 2020
812a1d2
update to remove unneeded services
awentzel Oct 1, 2020
f313a96
implement delete rg
awentzel Oct 2, 2020
25192ba
update to creating resource groups
awentzel Oct 2, 2020
4df702a
update to create asp
awentzel Oct 2, 2020
5235e34
update to refactor and test complete creation/deletion for rg and asp
awentzel Oct 5, 2020
fede923
starting to create app service
awentzel Oct 5, 2020
2ee17f5
update to creating apps
awentzel Oct 20, 2020
34eb42f
update with refactor for installing complete infastructure
awentzel Oct 23, 2020
a85dcf2
update to isolate resource groups by service and testing
awentzel Nov 4, 2020
f56774d
update to finalize for hand-off
awentzel Nov 4, 2020
5da0b29
Merge branch 'master' into users/awentzel/add-ops-scripts
awentzel Nov 9, 2020
312ca2d
Merge branch 'master' into users/awentzel/add-ops-scripts
awentzel Nov 12, 2020
8c93089
Update build/operations/README.md
awentzel Nov 12, 2020
6dca8b4
Merge branch 'master' into users/awentzel/add-ops-scripts
awentzel Nov 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions build/operations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Azure Cloud Documentation

## Getting Started
A series of Bash scripts are provided to perform infrastructure related tasks and require execution from within the `./build/operations` folder.

### Installation
Multiple options exist to use the Azure CLI for working with FAST Infrastructure.

Begin by [installing](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) the preferred Azure CLI.

Sign in interactively happens when using Azure CLI upon executing `bash recipes/login.sh` and leverages the security groups within the Azure tenant by launching the user into a web browser for authentication, then performs FAST configuration.

```bash
bash recipes/login.sh
```

When using FAST Bash scripts it's not necessary to manually login and perform CLI commands. Each script documents its purpose organized as recipes that execute services for installation, configuration, and maintenance tasks.

Experienced Azure CLI users may find it useful to perform management tasks directly using the Azure CLI rather than relying on FAST recipes.

### Configuration
System configuration occurs automatically upon first executing `bash recipes/login.sh` and is stored in each user's home directory.

* On Linux or MacOS it's stored at `$HOME/.azure`. To view execute `cat ~/.azure/config` on your CLI.
* On Windows it's stored at `%USERPROFILE%\.azure`.

For additional details on [Azure CLI Configuration](https://docs.microsoft.com/en-us/cli/azure/azure-cli-configuration?view=azure-cli-latest).

### Recipes Commands
Recipes are available for performing most infrastructure actions. While these scripts are suited for FAST, with slight modifications anyone with a valid Azure Subscription could quickly and easily onboard to Azure so long as they have the proper permissions to read/write/execute in the Azure Cloud.

For a complete list of recipes review each file inside `./build/operations/recipes`.

### Azure CLI Commands
Azure Cloud includes a comprehensive SDK accessible using Azure PowerShell (Windows) or Azure CLI (MacOS/Linux) to streamline Azure Resource Management.

![Web Architecture](diagrams/consistent-management-layer.png)
_Fast Azure Resource Management_

#### Deletion
Configuration management requires frequent build up and tear down procedures against Azure Cloud resources during development and testing. This command safely deletes the entire resource group and all resources that reside within.

This is an intentionally manual process that uses the interactive shell for added protection and safety from harming any existing infrastructure.

For beginners, it's recommended to use a development or testing subscription before experimenting with Azure CLI, to limit irreparable damage to production resources.

Three resources groups are used by FAST:
1. fast-westus-rg
2. fast-eastus-rg
3. fast-ops-rg

```bash
az group delete --name "some-resource-group-name"
```

## Architecture
FAST uses Azure Cloud's Platform as a Service for highly available multi-regional web applications.

This architecture uses an active/passive with hot standby approach. Meaning the primary regions is receives all traffic, while the other region awaits on hot standby. If the primary region fails for any reason, the secondary region picks up the load. With hot standby the secondary region allocated and running at all times. App Services uses staging slots for pre-production testing.

For improved isolation and availability in business continuity disaster recovery (BCDR) regionally pairing is used.

Learn more from [Azure Documentation](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/app-service-web-app/multi-region).

![Web Architecture](diagrams/multi-region-web-app-diagram.png)
_Fast Production Subscription_

FAST aims to serve optimized web traffic through extensive use of caching across several different services and application layers. There are two compression methods. Using middleware, for example in Express when running on Node, or using a Reverse Proxy, for example a load balancer or web server (iis, apache, nginx).

Brotli (br) is a newer compression algorithm that aims to further improve compression ratios, which can result in even faster page loads. It is compatible with the latest versions of most browsers. When requests support multiple compression types, Brotli takes precedence.

All traffic enters through AFD (Azure Front Door) used for traffic management, load balancing, failover, and dynamic acceleration with caching. AFD has the ability to cache all requests for a duration of 1-3 days which is dynamically and randomly assigned. There is a purge feature, which allows cache busting, for releasing and deployments. With AFD caching, no traffic requests are sent to the backend Azure Web Apps, drastically improving page load performance through reduced network latency.

All FAST websites are build on Azure Web Apps for Linux, running NodeJS technology stack with Express middleware. The Helmet package is installed for security protection. Express uses file caching for 3 days. This works because webpack hashes all website files into a bundles folder. When new files are released and new requests are made, cache is automatically broken and re-issued.

All websites serve assets such as images, scripts, and other media files from Azure CDN located on https://static.fast.design/assets. These files are automatically deployed based on changes to source code located in `./sites/site-utilities/statics/*`. When a requested asset specifies `gzip` compression, the request returns the cached file, when not found, Azure CDN performs Gzip compression directly on the POP server.

Any file not cached internally by the web app, CDN, are cached on Azure Front Door.

### Organizational Structure
This hierarchy uses the Workload Separation strategy involving management groups.

* Fast Design Management Group
* Fast Production
* Active Resource Group (Primary Region - West US)
* Global Operations Resource Group (non-regional specific)
* Standby Resource Group (Secondary Region - East US)
* Fast Development
* Active Resource Group (Primary Region - West US)
* Global Operations Resource Group (non-regional specific/dependent)
* Standby Resource Group (Secondary Region - East US)

### Front Door
This is considered a global resource and a type of Application Delivery Network (ADN) as a service, offering load-balancing capabilities for global routing to applications across availability regions using active/passive with hot standby approach. Performance is improved with dynamic site acceleration. Full end-to-end encryption is achieved using TLS/SSL offloading with auto-rotating certificates. Configuration changes to Front Door, are deployed across all POPs globally in 3 to 5 minutes. Any updates to the backend pools are seamless and cause zero downtime. For greater scale as traffic increases, we may implement an Azure Load Balance behind Front Door.

Front Door is a globally distributed multi-tenant platform with huge volumes of capacity to cater to each applications scalability needs. Delivered from the edge of Microsoft's global network, Front Door provides global load balancing capability that allows you to fail over your entire application or even individual micro-services across regions or different clouds. The Front Door service provides faster failover support because Front Door is a reverse proxy and sits on the network between the customer and your backend services. As a reverse proxy, Front Door offers additional features that Traffic Manager cannot provide.

The FAST Front Door will perform caching and dynamic content acceleration. Front Door can cache your static content and directly return cached assets to your customers without a trip to your backend further reducing latency.

#### Limitations
Several limitations exist between Azure Front Door and Azure Active Directory (AAD).

* Front Door does not support using Response rewriting
* MIME Types: There are certain limitations on fonts, images, an data files.

For additional limitations visit [details](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#azure-front-door-service-limits) https://docs.microsoft.com/en-us/azure/frontdoor/front-door-caching

_Note_ that Application Gateway has this capability. A new feature request has been sent to Azure Front Door.

**Issue**
We use Front Door to route traffic between two regions for web traffic across multiple subdomains. We have subdomains for https://stage.www.fast.design, for example, that we want to protect behind AAD. This works great when configuring AAD Reply URLs to match the backend server name/url. In this example, https://www-west-app.azurewebsites.net//.auth/login/aad/callback. However, we want users to return back to the original URL of https://stage.www.fast.design not the backend URL.

How can this be achieved through the Azure Portal, not web app code?

**Cause**
-Issue happens because of URL string within one of the parameters of the HTTP redirect pointing to AAD, which then uses such parameter for redirecting the user accordingly.
-Such URL was the one configured on the WebApp.

**Resolution**
-We managed to configure custom domain on both the WebApp and FrontDoor, so they both listen on same hostname. Such Hostname or FQDN, would CNAME to FrontDoor's domain.
-This custom domain was now used on the URL string within the AAD redirect so would work as expected. Nevertheless, since such AAD module responds with 302 to every unauthenticated user-agent, the FrontDoor probes would mark backend as unhealthy:

https://docs.microsoft.com/en-us/azure/frontdoor/front-door-health-probes

-When having multiple backends, it would round-robin requests, making it unusable.
-Usually a dummy path is setup in application for the 200 OK probes, but verified with AppServices it cannot be made with the pre-built AAD authentication module. To customize AAD, it's then suggested to implement it a code.

-This scenario doesn't happen with AppGW, since it can just re-write the URL parameter and also allow non-200 OK health responses.
awentzel marked this conversation as resolved.
Show resolved Hide resolved

#### Risks
* Failure Points: Front Door is a possible failure point in the system. If the service fails, clients cannot access your application during the downtime. Review the Front Door service level agreement (SLA) and determine whether using Front Door alone meets your business requirements for high availability. If not, consider adding another traffic management solution as a fallback. If the Front Door service fails, change your canonical name (CNAME) records in DNS to point to the other traffic management service. This step must be performed manually, and your application will be unavailable until the DNS changes are propagated.

### Key Vault
FAST uses one Key vault per environment (development, staging, and production) for an additional layer of security. If one environment is compromised, others remain safe. Key Vault, takes backups on regular cadence as objects stored within the Key Vault change. Subscriptions are stored in Azure Key Vault retrievable using Azure CLI, for authorized administrators only.

### Storage
Read-access geo-redundant storage (RA-GRS) is used, where the data is replicated to a secondary region. You have read-only access to the data in the secondary region through a separate endpoint. If there is a regional outage or disaster, the Azure Storage team might decide to perform a geo-failover to the secondary region. There is no customer action required for this failover.

### Azure CDN
For Azure CDN Standard from Microsoft profiles, propagation usually completes in 10 minutes. If you're setting up compression for the first time for your CDN endpoint, consider waiting 1-2 hours before you troubleshoot to ensure the compression settings have propagated to the POPs.

FAST CDN, leverages Blob Storage to cache infrequently updated application assets (logos, images, fonts, etc). This technique has greater durability, though does require more maintenance as assets must be deployed separately from Web Applications.

In code, GitHub Actions pulls assets from within `/site-utilities/statics/assets`. Any site that has CDN dependencies, should store these files in this folder. These files are deployed to the CDN upon `push` to `master` from any daily pull request. CDN resources can be referenced using `https://static.fast.design/assets/` matching the folder name in source code. For example, `https://static.fast.design/assets/favicon.ico`.

A CORS policy exists for allowing all production and staging sites for `*.*.fast.design` to accept requests for `.json` and `.js` files.

### Building for Resiliency
https://docs.microsoft.com/en-us/azure/architecture/framework/resiliency/overview

### Naming Standards
Follow all naming standards from https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/naming-and-tagging with the one exception, suffix instead of prefix. For example, prefixing with `rg-` is the recommendation. However, FAST uses `-rg` as a suffix instead to allow for product / area groupings and sorting. This works well considering that the Azure Portal already has a column for Resource Type.

* Use lower case letters
* Prefix with the product name, example: "fast"
* Append the Azure Service with the first letters of each service type, for example "as", would be used for App Service
* Append a random number from 1-9, example, fast-as-1, for services of the same type
* When services dictate, it's ok to deviate from this naming

## Log Analytics Reporting
To query Front Door metics and diagnostics on WAP run on Azure Log Analytics:

```bash
AzureDiagnostics
| where ResourceType == "FRONTDOORS" and Category == "FrontdoorWebApplicationFirewallLog"
| where action_s == "Block"
```

## Troubleshooting
### Status
Azure Status https://status.azure.com/en-us/status
12 changes: 12 additions & 0 deletions build/operations/azure.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# File is located at $AZURE_CONFIG_DIR/config and generated on first run of `$ bash login.sh`
# On Linux/MacOS: $HOME/.azure
# On Windows: %USERPROFILE%\.azure

[core]
disable_confirm_prompt=false
output=table

[logging]
enable_log_file=yes
log_dir=/var/log/azure

59 changes: 59 additions & 0 deletions build/operations/config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash

: 'FAST CLI CONFIGURATIONS
Contains all system, application, and Azure cli required configurations.

File is located at $AZURE_CONFIG_DIR/config and generated on first run of `$ bash login.sh`
* On Linux/MacOS: $HOME/.azure
* On Windows: %USERPROFILE%\.azure
'
az configure --defaults output=table disable_confirm_prompt=false enable_log_file=yes log_dir=/log/azure

# TERMINAL CONFIGURATIONS

black=$(tput setaf 0)
red=$(tput setaf 1)
green=$(tput setaf 2)
yellow=$(tput setaf 3)
blue=$(tput setaf 4)
magenta=$(tput setaf 5)
cyan=$(tput setaf 6)
white=$(tput setaf 7)
white_f=$(tput setab 7)
standout=$(tput smso)
bold=$(tput bold)
blink=$(tput blink)
reset=$(tput sgr0)

# COMMON VARIABLES

debug=false
dir=$(PWD)

declare -a applications=("color" "create" "explore" "www")
declare -a environments=("production" "staging" "development")
declare -a locations=("westus" "eastus" "centralus")
declare -a subscriptions=("production" "development")

system=fast
status=false

# GETTING STARTED

echo "----------------------------------------------------------------"
echo "${bold}${magenta}FAST AZURE OPERATIONS CLI"${reset}
echo "${bold}${magenta}Scripts are required to run from inside './fast/build/operations'"${reset}
echo "${magenta}To exit this program enter on [CTRL+c]'"${reset} && echo ""
echo "${bold}${green}Configuring session ...${reset}"
echo "${green}Default configurations found.${reset}" && echo ""

# SHELL References
source functions.sh
source inputs.sh

# SHELL Prompting
setEnvironment
setLocation

# SHELL Operations
getSubscription
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions build/operations/docs/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Testing Procedures for Azure Web Apps
## This test validates that backend is not accessible except by Azure Front Door
At each step below refresh web page. This procedure should be use for each subdomain. A check mark indicates Pass.

1. [ ] https://www.fast.design
2. [ ] https://color.fast.design
3. [ ] https://explore.fast.design
4. [ ] https://create.fast.design
5. [ ] https://motion.fast.design
6. [ ] https://app.fast.design

#### Validate all regions are turned off.
1. Turn off East Region,
2. Turn off West Region
3. Refresh
Expected result: 403, Web app is stopped

### Validate passive region
1. Turn on East Region
2. Turn off West Region
3. Refresh
Expected result: 200, Web app loads

### Validate passive region with direct access failing
1. Turn on East Region
2. Turn off West Region
3. Configure East Networking Access Restrictions
1. Set the following for IPv4
§ Name: Front Door IPv4
§ Priority: 100
§ Action: Allow
§ Description: Deny access to all except Front Door
§ IPv4 Address Block - 147.243.0.0/16
2. Set the following for IPv6
§ Name: Front Door IPv6
§ Priority: 200
§ Action: Allow
§ Description: Deny access to all except Front Door
§ IPv6 Address Block - 2a01:111:2050::/44
4. Refresh
Expected: 403, Forbidden "blocked your access"

### Validate active and passive Region turned off
1. Turn off East Region
2. Turn off West Region
3. Refresh
Expected: Service Unavailable

### Validate active region
1. Turn on West Region
2. Turn off East Region
3. Refresh
Expected: 200, Web app loads

### Validate active region with direct access failing
1. Turn off East Region
2. Turn on West Region
3. Configure West Networking Access Restrictions
1. Set the following for IPv4
§ Name: Front Door IPv4
§ Priority: 100
§ Action: Allow
§ Description: Deny access to all except Front Door
§ IPv4 Address Block: 147.243.0.0/16
2. Set the following for IPv6
§ Name: Front Door IPv6
§ Priority: 200
§ Action: Allow
§ Description: Deny access to all except Front Door
§ IPv6 Address Block: 2a01:111:2050::/44
4. Refresh
Expected: 403, Forbidden "blocked your access"

### Resources
https://stackoverflow.com/questions/62461510/how-to-configure-web-apps-such-that-they-cannot-be-accessed-directly
Loading