spiceai · sgrebnov · Apr 1, 2025 · Mar 31, 2025 · Mar 31, 2025 · Mar 31, 2025
diff --git a/website/docs/components/data-connectors/abfs.md b/website/docs/components/data-connectors/abfs.md
@@ -76,6 +76,7 @@ SELECT COUNT(*) FROM cool_dataset;
 | `abfs_disable_tagging`      | Disable tagging objects. Use this if your backing store doesn't support tags                                                                                                                                    |
 | `allow_http`                | Allow insecure HTTP connections                                                                                                                                                                                 |
 | `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false`                                                                                                                |
+| `schema_source_path`        | Specifies the URL used to infer the dataset schema. Default to the most recently modified file                                                                                                               |
 
 #### Authentication parameters
 

diff --git a/website/docs/components/data-connectors/file.md b/website/docs/components/data-connectors/file.md
@@ -55,6 +55,7 @@ SELECT COUNT(*) FROM cool_dataset;
 | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `file_format`               | Specifies the data file format. Required if the format cannot be inferred from the `from` path. Refer to [Object Store File Formats](/docs/components/data-connectors/index.md#object-store-file-formats) for details. |
 | `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false`                                                                                                                  |
+| `schema_source_path`        | Specifies the path used to infer the dataset schema. Default to the most recently modified file                                                                                                                     |
 
 For CSV-specific parameters, see [CSV Parameters](/docs/reference/file_format.md#csv).
 

diff --git a/website/docs/components/data-connectors/s3.md b/website/docs/components/data-connectors/s3.md
@@ -64,7 +64,8 @@ SELECT COUNT(*) FROM cool_dataset;
 | `s3_auth`                   | Authentication type. Options: `public`, `key` and `iam_role`. Defaults to `public` if `s3_key` and `s3_secret` are not provided, otherwise defaults to `key`.                                                                          |
 | `s3_key`                    | Access key (e.g. `AWS_ACCESS_KEY_ID` for AWS)                                                                                                                                                                                          |
 | `s3_secret`                 | Secret key (e.g. `AWS_SECRET_ACCESS_KEY` for AWS)                                                                                                                                                                                      |
-| `allow_http`                | Allow insecure HTTP connections to `s3_endpoint`. Defaults to `false`                                                                                                                                                                  |
+| `allow_http`                | Enables insecure HTTP connections to `s3_endpoint`. Defaults to `false`.                                                                                                                                                                  |
+| `schema_source_path`        | Specifies the URL used to infer the dataset schema. Default to the most recently modified file                                                                                                      |
 
 For additional CSV parameters, see [CSV Parameters](/docs/reference/file_format.md#csv)
 
@@ -148,6 +149,18 @@ datasets:
       hive_partitioning_enabled: true
 ```
 
+### Schema Source Path example
+
+Use `schema_source_path` to speed up dataset registration by specifying a URL to use to infer the schema.
+
+```yaml
+- from: s3://spiceai-demo-datasets/taxi_trips/
+  name: taxi_trips
+  params:
+    file_format: parquet
+    schema_source_path: s3://spiceai-demo-datasets/taxi_trips/2014/1/trips_01.parquet # or s3://spiceai-demo-datasets/taxi_trips/2014/1/
+```
+
 ## Secrets
 
 Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/docs/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/docs/components/secret-stores#using-secrets).