Skip to content

Latest commit

 

History

History
197 lines (110 loc) · 11.1 KB

CyberEstate-TVMData.md

File metadata and controls

197 lines (110 loc) · 11.1 KB

CyberEstate with Threat & Vulnerability Managment to the DataLake

This article is a follow up to my TVMIngestion. As a refresh, the background is as follows:

There is no Sentinel connector option for Microsofts XDR Threat Vulnerability Management data to ingest into Sentinel. Since the release of my LogicApp, which works flawless for smaller orgs - there are API limitations. Working alongside with two other colleagues (Seyed Amirhossein Nouraie & Mike Palitto) they had exposed me to Data Factory.

The why! When querying API data in XDR there are call limitations and when using the LogicApp it is easy to hit those limitations. Data Factory allows pagination (continually looping) on the odata call within the API until all data is seen and ingestion to X (your endpoint). This is a huge deal because all of our Federal Customers have this mandate to track their TVM data and send to another agency. Regardless if you are a Federal CX you will want this solution because; A - no connector to Sentinel or Streaming API in XDR, B - only 30 Days of data reside in XDR, C - the need for long term storage of said data to X (another endpoint). The great piece with this solution is that we are sending to a blob container and compressed! So the data will arrive on the storage account half the size as a gz file type. We can then query from ADX to view the data.

XDR to ADLS/ADX use case. You CAN send your XDR data via Streaming API to ADLS. Jeffrey Appel has saved me time to show you how to accomplish this, seen here. You CAN also export logs from Sentinel/LAW to the ADLS and either chose external query or ingest to ADX using Data Export. If you chose to do this as well, follow the steps to export logs to ADLS and then continue with the below steps and ingest continually, it follows the same flow.

An overview idea of what would small solution of all this enterprise logging around the Cyber Estate can look like is directly below.

The Data Lake was chosen specific to MSFT products. There are use cases when the need is to send out via the EventHub for interagency collaboration around Dashboardings. For these situations please refer to your architect and look at Event Hubs. This is but another working solution to use ADLS for the Life Cycle Mangaement around block blobs (compressed) and ADX ingestion. It is important to note that this is possible with External querying the data from ADX to ADLS but not in scope.

Lets start by setting up the Data Lake. You are going to deploy a standard Azure DataLake Storage Gen2 and we will use blob containers.

Deploy Storage Account - follow the steps below.

1 - In Azure, Create Storage Account.

2 - Enable Hierarchical Namespace. This will flag Data Lake GenV2 to kick off.

3 - Uncheck the recovery features. If you do not do this it will block the deployment

Deploy Data Factory - follow the steps below.

1 - In Azure, Create Data Factory.

2 - After creation of the Data Factory navigate to Managed Identities just under the Settings blade. Click "Azure Role Assignments".

3 - As seen in the image, add "Storage Blob Contributor" for this managed identity to the Data Lake created earlier.

4 - Open Azure PowerShell CLI and run the [script] you see below. Enter your Managed Identity of the ADF.

connect-azuread

$miObjectID = $null
Write-Host "Looking for Managed Identity with default prefix names of the Logic App..."
$miObjectIDs = @()
$miObjectIDs = (Get-AzureADServicePrincipal -SearchString "YourADFManagedIdentity").ObjectId
if ($miObjectIDs -eq $null) {
   $miObjectIDs = Read-Host -Prompt "Enter ObjectId of Managed Identity (from Logic App):"
}

# The app ID of the Microsoft Graph API where we want to assign the permissions
$appId = "fc780465-2017-40d4-a0c5-307022471b92"
$permissionsToAdd = @("Vulnerability.Read.All","Software.Read.All")
$app = Get-AzureADServicePrincipal -Filter "AppId eq '$appId'"

foreach ($miObjectID in $miObjectIDs) {
    foreach ($permission in $permissionsToAdd) {
    Write-Host $permissions
    $role = $app.AppRoles | where Value -Like $permission | Select-Object -First 1
    New-AzureADServiceAppRoleAssignment -Id $role.Id -ObjectId $miObjectIDs -PrincipalId $miObjectID -ResourceId $app.ObjectId
    }
}

Configuring Data Factory - follow steps below.

Disclaimer - it is important to note that for this demo I chose a commercial instance. You can change the endpoints to mimic other Azure Environments see right below.

Commercial URL = https://api.securitycenter.microsoft.com/api/machines/SoftwareVulnerabilitiesByMachine?deviceName
Commercial Audience = https://api.securitycenter.microsoft.com

GCC URL = https://api-gcc.securitycenter.microsoft.us/api/machines/SoftwareVulnerabilitiesByMachine?deviceName
GCC Audience = https://api-gcc.securitycenter.microsoft.us

GCCH URI = https://api-gov.securitycenter.microsoft.us/api/machines/SoftwareVulnerabilitiesByMachine?deviceName
GCCH Audience = https://api-gov.securitycenter.microsoft.us
Working w/ Data Factory - follow the steps below.

1 - Click "Launch Studio"

2 - Click Author > then the + sign > Pipeline > Import from pipeline template"

3 - When prompted for a ZIP file, down and save the TVM Data Factory Template file then upload as the template. Once uploaded the default upload will look like the below image.

4 - You will need to create Linked Services for each to work. On the "TVM_Rest_Vuln_Connection(Rest dataset)", click the drop-down and click "New". Follow the below snippet, test-connection and create. NOTE - none of these connections will work unless you have set the permissions for ADF on the managed identity via the script.

.

5 - Navigate to the next Linked Service, "TVM_Out", click the drop-down and click "New". Follow the snippet, test-connection and create. NOTE - this connection will not work if you did not follow the step to give the Managed Identity the Storage Contributor role.

6 - You will need to create Linked Services for each to work. On the "TVM_Rest_Software_Connection(Rest dataset)", click the drop-down and click "New". Follow the below snippet, test-connection and create. NOTE - none of these connections will work unless you have set the permissions for ADF on the managed identity via the script.

.

7 - You will need to create Linked Services for each to work. On the "TVM_Rest_Firmware_Connection(Rest dataset)", click the drop-down and click "New". Follow the below snippet, test-connection and create. NOTE - none of these connections will work unless you have set the permissions for ADF on the managed identity via the script.

.

8 - Verify all connections are good and hit complete.

Validate & Publish Data Factory Pipelines - follow the steps below.

1 - After the last step of configuration, you'll be brought back to the pipeline menu. Click debug. Everything should check out perfectly if the steps were followed.

.

2 - Navigate over to your Data Lake and verify the folders have been uploaded. Once there check the files are gz and block blobs (they are).

3 - Once firmed successful, click "Publish All".*

4* - Add a trigger to your pipeline - you chose the schedule.

Azure Data Explorer - You will have to follow these instructions for each TVM table.

Disclaimer - To Create ADX.

1 - Right click your DB and click, "Get Data". Give the table a name accordingly and select the container you have saved your ADF data too.

2 - On the inspect the data tab, review data (I removed two columns to only have value).

3 - You MAY be prompted to grant permissions to ADX to READ the blob data if you have not done so already. Snippet is below on what that would look like.

4 - Review the summary and hit close. You may know query the data in ADX.

5 - Query your data.

TVMDeviceVuln
| project value
| mv-expand value
| evaluate bag_unpack(value)

6 - The preceeding steps were only a one time ingestion. In order to continually ingest from ADLS, you will need to create an Event Grid. Follow the instructions to continually ingest new data the full automated solution. When using Event Grid it automatically kicks off a new EH but please understand the limitations around Standard Event Hubs, Namespaces etc. I recommend going Premium Event Hub in any org.