The focus of this article is to explain why, when and how to encrypt your credentials used by the backend of the Batch Shipyard system for various purposes. Please refer to the installation doc for information regarding required software in order to use the credential encryption features for Batch Shipyard.
Azure Batch naturally must deal with potentially sensitive information that users submit for job scheduling, such as command lines for processes, environment variables, and URLs for resource files. All of this information is encrypted from the point of origin from the submission machine or Azure Portal and stored in the Azure Batch service. All REST API calls are encrypted in transit through HTTPS to the Azure Batch service endpoints. Any sensitive information as described above is then encrypted by the Azure Batch Service. Encryption of this information is maintained until it is needed, such as executing the task which contains the command line to run.
If the Azure Batch service takes care of encrypting all of the sensitive user information, then why does Batch Shipyard need to encrypt credentials? The answer lies in if your scenario requires it. Because Batch Shipyard needs credentials for some of its components to work, such as Azure Storage, these credentials must be exposed to the compute node through environment variables or command line arguments. As explained above, due to the strict encryption policies enforced by the Azure Batch service, these credentials would never pose a risk to be exposed on their own, however, tools such as the Azure Portal, Batch Labs, Azure CLI or Azure PowerShell cmdlets can expose these credentials because command lines and environment variables are decrypted by the Azure Batch service and sent over HTTPS back to the user so that they may be viewable for status monitoring and diagnosis. Again, there is no risk for exposure to other parties while in-transit, however, they can be viewed once the data reaches the point of display - be it the web browser displaying the Azure Portal or the Batch Labs UI for example.
The question for you is, does this matter or not? Is there a risk of credential leakage by means of these UI or command line display mechanisms? If the answer is no, then no action needs to be taken with regards to credential encryption. However, if you believe that credentials may be exposed when displayed through the aforementioned mechanisms, then please read on for steps to enable credential encryption with Batch Shipyard.
There are various places where credentials are passed from the user from configuration input files to the compute nodes. By enabling credential encryption, these strings are replaced with encrypted text rendering viewing of them inconsequential without the private key. The series of actions that need to be taken in order to enable credential encryption are:
- Create certificates and keys locally
- Modify the global configuration file to reference these certificates
- Add the certificate to your Batch account
For step 1, invoke the cert create
command with shipyard.py
which will
create the necessary certificates and keys. The end result should be two files
(the names of which you will be prompted for) created:
- A PFX file for use with the Azure Batch service
- An RSA public key PEM file for use locally to encrypt
For step 2, there is one property that must be configured under
batch_shipyard
in the global configuration file prior to taking any action:
encryption:
enabled: true
pfx:
filename: encrypt.pfx
passphrase: mysupersecretpassword
sha1_thumbprint: 123456789...
public_key_pem: encrypt.pem
Ensure that the enabled
property is set to true
and that the pfx
members are correctly populated. It is recommended to fill the
public_key_pem
and sha1_thumbprint
(which is output at the end of
cert create
) members such that they do not need to be generated each
time encryption is required.
Step 3 is optional to perform explicitly, but one may invoke cert add
with
shipyard.py
to add the certificate to the Batch account specified in the
credentials config file. If encryption
is enabled, then this cert add
action is automatically invoked for every subsequent pool add
.
Note that encryption is not applied retroactively to existing pools. If you are adding encryption to your global configuration file, please make sure that you recreate pools for which you wish to schedule jobs to.
System-installed openssl
is used in all certificate, encryption and
decryption routines. RSA asymmetric encryption (instead of symmetric key
enveloping techniques) is used as the amount of data that needs to be
encrypted is small which keeps the process simple and understandable.
All applicable Azure Storage account keys, generated SAS keys, Docker login passwords, and Azure Batch credentials are encrypted if credential encryption is enabled.
Please see this page for a full explanation of each credential encryption configuration option.