Skip to content

Design and IaC setup for streaming various logs to a Kusto Cluster (Azure Data Explorer Cluster)

License

Notifications You must be signed in to change notification settings

erwinkramer/kusto-event-hub-law

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streaming logs to a Kusto Cluster 🤽🏻‍♂️

CC BY-NC-SA 4.0 GitHub commit activity

Stream various logs to a Kusto Cluster (Azure Data Explorer Cluster), such as:

  • Log Analytics logs, via export functionality and Event Hub
  • Diagnostics logs, via Event Hub
  • External logs, via plugins

Some bits were from the azure-quickstart-templates, but i took the cheapest SKUs and the simplest testable setup, batteries included.

Configuration

  1. privateDnsZoneGroups for the Kusto private endpoint can be deployed via the Configure private DNS Zones for an Azure Data Explorer cluster groupID policy, or via Bicep by setting deployZoneGroupsViaPolicy to false.

  2. Create an Entra ID group for read permissions on the database, and provide the object id to the entraIdGroupDataViewersObjectId var in Bicep.

Kusto extension

For the Kusto Language Server extension, that installs with the VS Code recommendations, please install specific version 3.4.1 and not 3.4.2, because of issue Language Server v3.4.2 not working #218.

Event Hubs deflating

Event Hubs auto-inflate can be enabled on the standard tier, but does not facilitate deflation, this can facilitated by something like eventHubAutoDeflate in this repo.

Multi-region design

Because Event Hubs can only connect to resources from the same region, consider the following simplified design for connecting multiple regions and sources:

flowchart LR

ext[External Sources]
ext -- plugins --> misctable

msdef[Microsoft Defender]
msdef --> evhdeweu --> defetable

subgraph Azure - West Europe
    reslaweu[Log Analytics Resources]
    resweu[Azure Resources]

    subgraph Event Hub Namespace
        evhlaweu[Event Hub - Log Analytics]
        evhdiweu[Event Hub - Diagnostics]
        evhdeweu[Event Hub - Defender]
    end

    
    subgraph Azure Data Explorer Db
        lawtable[Azure Monitor Table]
        diagtable[Diagnostics Table]
        defetable[Defender Table]
        misctable[Miscellaneous Tables]
    end

    reslaweu--Export functionality-->evhlaweu-->lawtable

    resweu--Diagnostic settings-->evhdiweu-->diagtable
end
    
subgraph Azure - North Europe
    resneu[Azure Resources]

    subgraph Event Hub Namespace
        evhdineu[Event Hub - Diagnostics]
    end

    resneu--Diagnostic settings-->evhdineu-->diagtable
end
Loading

Generic table design

Generic handling of events is possible because of the standardization in logs:

Routing options

Event Hub routing

Note: Specific for exports from Log Analytics workspace.

Remove the eventHubName element from the Microsoft.OperationalInsights/workspaces/dataExport to dynamically route to an event hub with the table name, then create a Microsoft.Kusto/clusters/databases/dataConnections for each event hub.

ADX routing

Make the Kusto query smarter and use the Type column to place the records in specific tables, using something like this to directly place data in a specific table. You can also use generic tables, as mentioned at Generic table design.

To end up in specific tables in a performant way, you can first put them in a generic table, and then set policies to push the records into specific tables, this scales a bit better, since you would only do mv-expand once for each record when putting the records in the generic table. See DIAG_generic and DIAG_ADXCommand for a sample. Visually it will look like this:

flowchart TD
  
rawlogs[Raw Log table, with softdelete = 0d]
gendiag[Generic Diagnostics table, with softdelete = 0d]
ts[Table storage logs]
bs[Blob storage logs]
qs[Queue storage logs]

rawlogs -- policy: mv-expand--> gendiag
gendiag -- policy: category==table --> ts
gendiag -- policy: category==queue --> qs
gendiag -- policy: category==blob --> bs
Loading

This model is similar to a Medallion architecture. To monitor performance impact, please use .show queries.

Stream Analytics routing

Stream Analytics can be placed between Event Hub and Azure Data Explorer, with the no-code editor it might look like this in a Stream Analytics job:

Stream Analytics

Considerations:

  1. Because events get batched at Event Hub, you still have to expand to the actual events from the records array inside a job.
  2. Every adx table is 1 output, there's a hard limit of 60 outputs per Stream Analytics job, you could work around this by making multiple Event Hub Consumer Groups and process the same events in multiple jobs.
  3. The designer is nice, but not possible to switch from no-code to code and back, you will be presented with the following message, maybe this is a preview limitation:

    Once confirmed to edit the query, no-code editor will no longer be available.

  4. Without designer, you have to work with the Stream Analytics Query Language, where Stream Analytics User Defined Functions (UDF), either in JavaScript or C# can provide reusable snippets. UDF's are limited to 60 per job.
  5. With a Multi-region design, you end up with an event hub input for each region. In a single job, within the designer, this not practical to work with, since you cannot connect more than 1 input to an operation (such as filter or expand).

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

About

Design and IaC setup for streaming various logs to a Kusto Cluster (Azure Data Explorer Cluster)

Topics

Resources

License

Stars

Watchers

Forks