microsoft · mwc360 · Oct 21, 2025 · Oct 21, 2025 · Oct 21, 2025
diff --git a/README.md b/README.md
@@ -32,7 +32,7 @@ LakeBench exists to bring clarity, trust, accessibility, and relevance to engine
 
 
 ## ✅ Why LakeBench?
-- **Multi-Engine**: Benchmark Spark, DuckDB, Polars, and many more planned, side-by-side
+- **Multi-Engine**: Benchmark Spark, DuckDB, Polars, Daft, Sail and others, side-by-side
 - **Lifecycle Coverage**: Ingest, transform, maintain, and query—just like real workloads
 - **Diverse Workloads**: Test performance across varied data shapes and operations
 - **Consistent Execution**: One framework, many engines
@@ -46,7 +46,7 @@ LakeBench empowers data teams to make informed engine decisions based on real wo
 
 LakeBench currently supports four benchmarks with more to come:
 
-- **ELTBench**: An benchmark with various modes (`light`, `full`) that simulates typicaly ELT workloads:
+- **ELTBench**: An benchmark that simulates typicaly ELT workloads:
   - Raw data load (Parquet → Delta)
   - Fact table generation
   - Incremental merge processing
@@ -65,7 +65,10 @@ LakeBench supports multiple lakehouse compute engines. Each benchmark scenario d
 
 | Engine          | ELTBench | TPC-DS | TPC-H   | ClickBench |
 |-----------------|:--------:|:------:|:-------:|:----------:|
-| Spark (Fabric)  |    ✅    |   ✅   |   ✅  |    ✅    |
+| Spark (Generic) |    ✅    |   ✅   |   ✅  |    ✅    |
+| Fabric Spark    |    ✅    |   ✅   |   ✅  |    ✅    |
+| Synapse Spark   |    ✅    |   ✅   |   ✅  |    ✅    |
+| HDInsight Spark |    ✅    |   ✅   |   ✅  |    ✅    |
 | DuckDB          |    ✅    |   ✅   |   ✅  |    ✅    |
 | Polars          |    ✅    |   ⚠️   |   ⚠️  |    🔜    |
 | Daft            |    ✅    |   ⚠️   |   ⚠️  |    🔜    |
@@ -77,6 +80,28 @@ LakeBench supports multiple lakehouse compute engines. Each benchmark scenario d
 > 🔜 = Coming Soon  
 > (Blank) = Not currently supported 
 
+## Where Can I Run LakeBench?
+Multiple modalities doesn't end at just benchmarks and engines, LakeBench also supports different runtimes and storage backends:
+
+**Runtimes**:
+  - Local (Windows)
+  - Fabric
+  - Synapse
+  - HDInsight
+  - Google Colab ⚠️
+
+**Storage Systems**:
+  - Local filesystem (Windows)
+  - OneLake
+  - ADLS gen2 (temporarily only in Fabric, Synapse, and HDInsight)
+  - S3 ⚠️
+  - GS ⚠️
+
+_* ⚠️ denotes experimental storage backends_
+
+## What Table Formats Are Supported?
+LakeBench currently only supports Delta Lake.
+
 ## 🔌 Extensibility by Design
 
 LakeBench is designed to be _extensible_, both for additional engines and benchmarks. 
@@ -123,8 +148,6 @@ Install from PyPi:
 pip install lakebench[duckdb,polars,daft,tpcds_datagen,tpch_datagen,sparkmeasure]
 ```
 
-_Note: in this initial beta version, all engines have only been tested inside Microsoft Fabric Python and Spark Notebooks._
-
 ## Example Usage
 To run any LakeBench benchmark, first do a one time generation of the data required for the benchmark and scale of interest. LakeBench provides datagen classes to quickly generate parquet datasets required by the benchmarks.
 

diff --git a/examples/benchmarks/hdi_spark.ipynb b/examples/benchmarks/hdi_spark.ipynb
@@ -0,0 +1,182 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "04aa8c89",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%configure -f\n",
+    "{\n",
+    "    \"conf\": {\n",
+    "        \"spark.jars\": \"abfss://<container>@<storage_account_name>.dfs.core.windows.net/jars/delta-core_2.12-2.1.1.jar\"\n",
+    "    }\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c398c05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# build lakebench zip and upload to ADLS Gen2\n",
+    "sc.addPyFile(\"abfss://<container>@<storage_account_name>.dfs.core.windows.net/libs/lakebench.zip\")\n",
+    "import lakebench"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7ab46f85",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Enable arbitrary Delta table properties to prevent failure if LakeBench attempts to set newer properties that are not the HDI compatible version of Delta Lake\n",
+    "spark.conf.set('spark.databricks.delta.allowArbitraryProperties.enabled', True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24c7f205",
+   "metadata": {},
+   "source": [
+    "## Run ELTBench in `light` mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "feb7d1b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import HDISpark\n",
+    "from lakebench.benchmarks import ELTBench\n",
+    "\n",
+    "engine = HDISpark(\n",
+    "    schema_name ='spark_eltbench_test',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = ELTBench(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF1\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds_sf1',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/lakebench/results'\n",
+    "    )\n",
+    "benchmark.run(mode=\"light\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d1ab723",
+   "metadata": {},
+   "source": [
+    "## Run TPCDS `power_test` (Load tables and run all queries)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "feaf7122",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import HDISpark\n",
+    "from lakebench.benchmarks import TPCDS\n",
+    "\n",
+    "engine = HDISpark(\n",
+    "    schema_name = 'spark_tpcds_sf1',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = TPCDS(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF1 - Power Test\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds_sf1',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/dbo/results'\n",
+    "    )\n",
+    "benchmark.run(mode=\"power_test\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88ac860b",
+   "metadata": {},
+   "source": [
+    "## Run TPCDS `query` test: q1 run 4 times"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cae6db9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import HDISpark\n",
+    "from lakebench.benchmarks import TPCDS\n",
+    "\n",
+    "engine = HDISpark(\n",
+    "    schema_name = 'spark_tpcds_sf1',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = TPCDS(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF1 - Q4*4\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds/sf1_parquet',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/dbo/results',\n",
+    "    query_list=['q1'] * 4\n",
+    "    )\n",
+    "benchmark.run(mode=\"query\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52a01f5b",
+   "metadata": {},
+   "source": [
+    "## Run TPCH Query Test (Run all queries)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0768e9b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import HDISpark\n",
+    "from lakebench.benchmarks import TPCH\n",
+    "\n",
+    "engine = HDISpark(\n",
+    "    schema_name = 'spark_tpch_sf10',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = TPCH(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF10 - All Queries\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds/sf10_parquet',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/dbo/results'\n",
+    "    )\n",
+    "benchmark.run(mode=\"query\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/benchmarks/synapse_spark.ipynb b/examples/benchmarks/synapse_spark.ipynb
@@ -0,0 +1,144 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "24c7f205",
+   "metadata": {},
+   "source": [
+    "## Run ELTBench in `light` mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "feb7d1b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import SynapseSpark\n",
+    "from lakebench.benchmarks import ELTBench\n",
+    "\n",
+    "engine = SynapseSpark(\n",
+    "    schema_name ='spark_eltbench_test',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = ELTBench(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF1\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds_sf1',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/lakebench/results'\n",
+    "    )\n",
+    "benchmark.run(mode=\"light\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d1ab723",
+   "metadata": {},
+   "source": [
+    "## Run TPCDS `power_test` (Load tables and run all queries)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "feaf7122",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import SynapseSpark\n",
+    "from lakebench.benchmarks import TPCDS\n",
+    "\n",
+    "engine = SynapseSpark(\n",
+    "    schema_name = 'spark_tpcds_sf1',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = TPCDS(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF1 - Power Test\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds_sf1',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/dbo/results'\n",
+    "    )\n",
+    "benchmark.run(mode=\"power_test\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88ac860b",
+   "metadata": {},
+   "source": [
+    "## Run TPCDS `query` test: q1 run 4 times"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cae6db9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import SynapseSpark\n",
+    "from lakebench.benchmarks import TPCDS\n",
+    "\n",
+    "engine = SynapseSpark(\n",
+    "    schema_name = 'spark_tpcds_sf1',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = TPCDS(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF1 - Q4*4\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds/sf1_parquet',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/dbo/results',\n",
+    "    query_list=['q1'] * 4\n",
+    "    )\n",
+    "benchmark.run(mode=\"query\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52a01f5b",
+   "metadata": {},
+   "source": [
+    "## Run TPCH Query Test (Run all queries)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0768e9b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lakebench.engines import SynapseSpark\n",
+    "from lakebench.benchmarks import TPCH\n",
+    "\n",
+    "engine = SynapseSpark(\n",
+    "    schema_name = 'spark_tpch_sf10',\n",
+    "    spark_measure_telemetry = False\n",
+    ")\n",
+    "\n",
+    "benchmark = TPCH(\n",
+    "    engine=engine,\n",
+    "    scenario_name=\"SF10 - All Queries\",\n",
+    "    input_parquet_folder_uri='abfss://........./Files/tpcds/sf10_parquet',\n",
+    "    save_results=True,\n",
+    "    result_table_uri='abfss://......../Tables/dbo/results'\n",
+    "    )\n",
+    "benchmark.run(mode=\"query\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}