Studio Go Runner Storage
The studio eco system supports the notion of capturing data related to experiments. Captured data is rolled into artifacts and stored within the experiment bucket prior to the platform initiating the experiment. Capturing artifacts is an important step in ensuring that experiments can be reproduced. Artifacts related to experiments can also be reused in other downstream experiments. These features are documented further at http://docs.studio.ml/en/latest/artifacts.html.
Captured artifacts are protected via the access policies specified by the experiment bucket within which they are placed. Experiment initiation will leverage the AWS credentials of the shell environment variables the user has active at that time. Once the experiment commences within the runner the credentials specified by the ~/.studioml/config.yaml files that have been migrated will be used by the runner to unpack any artifacts that accompany the experiments. Experiments can also make use of external data sources that have been copied to local S3/minio servers within private and public infrastructure that are either publicaly accessible, or have been uploaded using the same private credentials as those used within the env section of the config.yaml file.
Currently StudioML supports a single private and the default public access credentials. Because artifacts are centrally aggregated as the experiment starts this allows artifacts to be placed under a single protected umbrella as the experiment progresses.
It is recommended that AWS or Minio policies be used to protect the experiment artifacts at the highest level of rigor seen in any single source bucket being used within experiments. Should experiment reproducibility not be a goal it is also possible to enable access using temporary credentials to source data available only during experiment initiation, and then for credentials to be revoked once the experiment execution is completed with artifacts on the studioml data store also destroyed upon completion or locked down by changing ownership etc.
When using private AWS based kubernetes clusters then securing resources and data becomes an intrinsic part of cluster deployment. In these cases using IAM and AWS native EKS offers a good way of using IAM end-to-end to secure all components of the solution. In these cases the StudioML go runner can be deployed as a single pod per node and given appropriate account level privileges without requiring exposure to the outside world of the runners or the data they will again access to using artifacts.
Copyright © 2019-2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.