This repository demonstrates how to run standard Apache Spark applications with BigDL PPML and Occlum on Azure Intel SGX enabled Confidential Virtual machines (DCsv3 or Azure Kubernetes Service (AKS)). These Azure Virtual Machines include the Intel SGX extensions.
Key points:
- Azure DC Series: We run distributed Spark 3.1.2 examples, on an Azure DCsv3 machine running Docker. These machines are backed by the 3rd generation Intel Xeon Scalabe Processor with large Encrypted Page Cache (EPC) memory.
- Occlum: To run Spark inside an Intel SGX enclave - we leverage Occlum, who have essentially taken the Open Source Spark code, and wrapped it with their enclave runtime so that Spark can run inside SGX enclaves (a task that requires deep knowledge of the SGX ecosystem - something Occlum is an expert at).
- For Azure attestation details in Occlum init process please refer to
maa_init
.
- Set up Azure VM on Azure
- Prepare image of Spark
- (Required for distributed Spark examples only) Download Spark 3.1.2 and extract Spark binary. Install OpenJDK-8, and
export SPARK_HOME=${Spark_Binary_dir}
.
Pull the image from Dockerhub.
docker pull intelanalytics/bigdl-ppml-azure-occlum:2.1.0
Or you can clone this repository and build image with build-docker-image.sh
. Configure environment variables in build-docker-image.sh
.
Build the docker image:
bash build-docker-image.sh
Single node Spark Examples require 1 Azure VM with SGX. All examples are running in SGX. You can apply it to your application with a few changes in dockerfile or scripts.
Run the SparkPi example with run_spark_on_occlum_glibc.sh
.
docker run --rm -it \
--name=azure-ppml-example-with-occlum \
--device=/dev/sgx/enclave \
--device=/dev/sgx/provision \
intelanalytics/bigdl-ppml-azure-occlum:2.1.0 bash
cd /opt
bash run_spark_on_occlum_glibc.sh pi
Run the Nytaxi example with run_azure_nytaxi.sh
.
docker run --rm -it \
--name=azure-ppml-example-with-occlum \
--device=/dev/sgx/enclave \
--device=/dev/sgx/provision \
intelanalytics/bigdl-ppml-azure-occlum:2.1.0 bash
bash run_azure_nytaxi.sh
You should get Nytaxi dataframe count and aggregation duration when succeed.
Configure environment variables in run_spark_pi.sh
, driver.yaml
and executor.yaml
. Then you can submit SparkPi task with run_spark_pi.sh
.
bash run_spark_pi.sh
Configure environment variables in run_nytaxi_k8s.sh
, driver.yaml
and executor.yaml
. Then you can submit Nytaxi query task with run_nytaxi_k8s.sh
.
bash run_nytaxi_k8s.sh
- If you meet the following error when running the docker image:
aesm_service[10]: Failed to set logging callback for the quote provider library.
aesm_service[10]: The server sock is 0x5624fe742330
This may be associated with SGX DCAP. And it's expected error message if not all interfaces in quote provider library are valid, and will not cause a failure.
- If you meet the following error when running MAA example:
[get_platform_quote_cert_data ../qe_logic.cpp:352] p_sgx_get_quote_config returned NULL for p_pck_cert_config.
thread 'main' panicked at 'IOCTRL IOCTL_GET_DCAP_QUOTE_SIZE failed', /opt/src/occlum/tools/toolchains/dcap_lib/src/occlum_dcap.rs:70:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ERROR] occlum-pal: The init process exit with code: 101 (line 62, file src/pal_api.c)
[ERROR] occlum-pal: Failed to run the init process: EINVAL (line 150, file src/pal_api.c)
[ERROR] occlum-pal: Failed to do ECall: occlum_ecall_broadcast_interrupts with error code 0x2002: Invalid enclave identification. (line 26, file src/pal_interrupt_thread.c)
/opt/occlum/build/bin/occlum: line 337: 3004 Segmentation fault (core dumped) RUST_BACKTRACE=1 "$instance_dir/build/bin/occlum-run" "$@"
This may be associated with [RFC] IOCTRL IOCTL_GET_DCAP_QUOTE_SIZE failed.