Stream various logs to a Kusto Cluster (Azure Data Explorer Cluster), such as:
- Log Analytics logs, via export functionality and Event Hub
- Diagnostics logs, via Event Hub
- External logs, via plugins
Some bits were from the azure-quickstart-templates, but i took the cheapest SKUs and the simplest testable setup, batteries included.
-
privateDnsZoneGroups
for the Kusto private endpoint can be deployed via the Configure private DNS Zones for an Azure Data Explorer cluster groupID policy, or via Bicep by settingdeployZoneGroupsViaPolicy
tofalse
. -
Create an Entra ID group for read permissions on the database, and provide the object id to the
entraIdGroupDataViewersObjectId
var in Bicep.
For the Kusto Language Server extension, that installs with the VS Code recommendations, please install specific version 3.4.1
and not 3.4.2
, because of issue Language Server v3.4.2 not working #218.
Event Hubs auto-inflate can be enabled on the standard tier, but does not facilitate deflation, this can facilitated by something like eventHubAutoDeflate in this repo.
Because Event Hubs can only connect to resources from the same region, consider the following simplified design for connecting multiple regions and sources:
flowchart LR
ext[External Sources]
ext -- plugins --> misctable
msdef[Microsoft Defender]
msdef --> evhdeweu --> defetable
subgraph Azure - West Europe
reslaweu[Log Analytics Resources]
resweu[Azure Resources]
subgraph Event Hub Namespace
evhlaweu[Event Hub - Log Analytics]
evhdiweu[Event Hub - Diagnostics]
evhdeweu[Event Hub - Defender]
end
subgraph Azure Data Explorer Db
lawtable[Azure Monitor Table]
diagtable[Diagnostics Table]
defetable[Defender Table]
misctable[Miscellaneous Tables]
end
reslaweu--Export functionality-->evhlaweu-->lawtable
resweu--Diagnostic settings-->evhdiweu-->diagtable
end
subgraph Azure - North Europe
resneu[Azure Resources]
subgraph Event Hub Namespace
evhdineu[Event Hub - Diagnostics]
end
resneu--Diagnostic settings-->evhdineu-->diagtable
end
Generic handling of events is possible because of the standardization in logs:
-
The
Azure Monitor Table
follows the Standard columns in Azure Monitor Logs. -
The
Diagnostics Table
follows the Azure resource log common schema. -
The
Defender Table
follows the schema of the events in Azure Event Hubs -
The
Defender for Cloud Table
follows the Workflow automation and export data types schemas
Note: Specific for exports from Log Analytics workspace.
Remove the eventHubName
element from the Microsoft.OperationalInsights/workspaces/dataExport
to dynamically route to an event hub with the table name, then create a Microsoft.Kusto/clusters/databases/dataConnections
for each event hub.
Make the Kusto query smarter and use the Type
column to place the records in specific tables, using something like this to directly place data in a specific table. You can also use generic tables, as mentioned at Generic table design.
To end up in specific tables in a performant way, you can first put them in a generic table, and then set policies to push the records into specific tables, this scales a bit better, since you would only do mv-expand
once for each record when putting the records in the generic table. See DIAG_generic and DIAG_ADXCommand for a sample. Visually it will look like this:
flowchart TD
rawlogs[Raw Log table, with softdelete = 0d]
gendiag[Generic Diagnostics table, with softdelete = 0d]
ts[Table storage logs]
bs[Blob storage logs]
qs[Queue storage logs]
rawlogs -- policy: mv-expand--> gendiag
gendiag -- policy: category==table --> ts
gendiag -- policy: category==queue --> qs
gendiag -- policy: category==blob --> bs
This model is similar to a Medallion architecture. To monitor performance impact, please use .show queries.
Stream Analytics can be placed between Event Hub and Azure Data Explorer, with the no-code editor it might look like this in a Stream Analytics job:
Considerations:
- Because events get batched at Event Hub, you still have to expand to the actual events from the
records
array inside a job. - Every adx table is 1 output, there's a hard limit of 60 outputs per Stream Analytics job, you could work around this by making multiple Event Hub Consumer Groups and process the same events in multiple jobs.
- The designer is nice, but not possible to switch from no-code to code and back, you will be presented with the following message, maybe this is a preview limitation:
Once confirmed to edit the query, no-code editor will no longer be available.
- Without designer, you have to work with the Stream Analytics Query Language, where Stream Analytics User Defined Functions (UDF), either in JavaScript or C# can provide reusable snippets. UDF's are limited to 60 per job.
- With a Multi-region design, you end up with an event hub input for each region. In a single job, within the designer, this not practical to work with, since you cannot connect more than 1 input to an operation (such as
filter
orexpand
).
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.