elastic · benironside · Apr 3, 2025 · Apr 3, 2025 · Apr 3, 2025
@@ -19,9 +19,29 @@ TIP: Click https://elastic.navattic.com/automatic-import[here] to access an inte
 .Requirements
 [sidebar]
 --
-- A working <<llm-connector-guides, LLM connector>>. Recommended models: `Claude 3.5 Sonnet`; `GPT-4o`; `Gemini-1.5-pro-002`. 
+- A working <<llm-connector-guides, LLM connector>>. 
 - An https://www.elastic.co/pricing[Enterprise] subscription.
-- A sample of the data you want to import, in a structured or unstructured format (including JSON, NDJSON, and Syslog). 
+- A sample of the data you want to import.
+--
+
+.Notes on sample data
+[sidebar]
+--
+To use Automatic Import, you must provide a sample of the data you wish to import. An LLM will process that sample and automatically create an integration suitable for processing the data represented by the sample. **Any structured or unstructured format is acceptable, including but not limited to JSON, NDJSON, CSV, Syslog.** 
+
+* You can upload a sample of arbitrary size. The LLM will detect its format and select up to 100 documents for detailed analysis.
+* The more variety in your sample, the more accurate the pipeline will be. For best results, include a wide range of unique log entries in your sample instead of repeating similar logs.
+* When uploading a CSV, a header with column names will be automatically recognized. However if the header is not present, the LLM will still attempt to create descriptive field names based on field formats and values.
+* For JSON and NDJSON samples, each object in your sample should represent an event, and you should avoid deeply nested object structures.
+* When you select `API (CEL input)` as one of the sources, you will be prompted to provide the associated OpenAPI specification (OAS) file to generate a CEL program that consumes this API.
+--
+
+WARNING: Note that CEL generation in Automatic Import is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.
+
+.Recommended models
+[sidebar]
+--
+You can use Automatic Import with any LLM, however model performance varies. Model performance for Automatic Import is similar to model performance for Attack Discovery; models that perform well for Attack Discovery perform well for Automatic Import. Refer to the <<llm-performance-matrix, LLM performance Matrix>>.
 --
 
 IMPORTANT: Using Automatic Import allows users to create new third-party data integrations through the use of third-party generative AI models (“GAI models”). Any third-party GAI models that you choose to use are owned and operated by their respective providers. Elastic does not own or control these third-party GAI models, nor does it influence their design, training, or data-handling practices. Using third-party GAI models with Elastic solutions, and using your data with third-party GAI models is at your discretion. Elastic bears no responsibility or liability for the content, operation, or use of these third-party GAI models, nor for any potential loss or damage arising from their use. Users are advised to exercise caution when using GAI models with personal, sensitive, or confidential information, as data submitted may be used to train the models or for other purposes. Elastic recommends familiarizing yourself with the development practices and terms of use of any third-party GAI models before use. You are responsible for ensuring that your use of Automatic Import complies with the terms and conditions of any third-party platform you connect with.
@@ -41,15 +61,6 @@ image::images/auto-import-create-new-integration-button.png[The Integrations pag
 7. Define your **Data stream title**, **Data stream description**, and **Data stream name**. These fields appear on the integration's configuration page to help identify the data stream it writes to.
 8. Select your {filebeat-ref}/configuration-filebeat-options.html[**Data collection method**]. This determines how your new integration will ingest the data (for example, from an S3 bucket, an HTTP endpoint, or a file stream).
 9. Upload a sample of your data. Make sure to include all the types of events that you want the new integration to handle. 
-+
-.Best practices for sample data
-[sidebar]
---
-- For JSON and NDJSON samples, each object in your sample should represent an event, and you should avoid deeply nested object structures. 
-- The more variety in your sample, the more accurate the pipeline will be. Include a wide range of unique log entries instead of just repeating the same type of entry. Automatic Import will select up to 100 different events from your sample to use as the basis for the new integration. 
-- Ideally, each field name should describe what the field does.
---
-+
 10. Click **Analyze logs**, then wait for processing to complete. This may take several minutes.
 11. After processing is complete, the pipeline's field mappings appear, including ECS and custom fields.
 +