To avoid unplanned costs for Microsoft Sentinel, it is recommended to set a daily cap and create an analytics rule that triggers an alert when the daily cap is reached. Microsoft has published general guidance for monitoring costs here
In the past months I have deployed a number of Microsoft Sentinel instances and in many cases the root cause for reaching the daily cap was related to data ingested into the AADNonInteractiveUserSignInLogs table. When analyzing the data we often found an individual user that created an unusual high amount of events. This can happen for various reasons such as:
- The user is still logged on to a device, but has changed their password on another device
- The user has left the company , but is still logged on to some virtual desktops
- The user account is disabled, but the user is still logged on somewhere
- The user has left the company, his account is deactivated, but their mobile phone is still trying to pull e-mails
Okay, let’s start at the beginning
Data Cap
To avoid a bill shock, we set a daily cap.
Analytics Rule
If we want to get alerted, we can setup an analytics rule within Microsoft Sentinel as shown in the example below.
The Alert
Whit the analytics rule in place, we get an alert as shown below when the daily data cap is reached.
Analyzing the Data Usage
Now that we have an alert , we have to investigate, what caused the high data volume. Logon to the Azure Portal and navigate to the Usage and estimated costs blade within the Microsoft Sentinel Log Analytics Workspace. Here we can already identify what Solution caused the data ingestion increase, Select the Open chart in analytics button
Log Analytics is opened with a predefined query that shows the usage. Here we see that LogManagement had an increase in data ingestion. Remove the start date and set the time range to 24 hours.
Usage
| where IsBillable == true
| summarize TotalVolumeGB = sum(Quantity) / 1000 by bin(StartTime, 1d), Solution
| render columnchart
Change the query to display DataType instead of Solution, then re-run the query
Usage
| where IsBillable == true
| summarize TotalVolumeGB = sum(Quantity) / 1000 by bin(StartTime, 1d), DataType
| render columnchart
Next remove the | render instruction from the query to see the details
Usage
| where IsBillable == true
| summarize TotalVolumeGB = sum(Quantity) / 1000 by bin(StartTime, 1d), DataType
Now let’s find the user(s) that cause the high event volume.
AADNonInteractiveUserSignInLogs
| summarize count() by UserPrincipalName
Next we drill down into the events just for the user that triggers the most events.
AADNonInteractiveUserSignInLogs
| where UserPrincipalName == “john.doe@foocorp.com”
| summarize count() by UserPrincipalName, ClientAppUsed, AppDisplayName
Here we see that we have a lot of Windows Sign in events. Next lets drill into the details to identify the device.
AADNonInteractiveUserSignInLogs
| where UserPrincipalName == “john.doe@foocorp.com”
| where AppDisplayName == “Windows Sign In”
| extend DeviceName = tostring(parse_json(DeviceDetail).displayName)
| extend trustType = tostring(parse_json(DeviceDetail).trustType)
| extend deviceId_ = tostring(parse_json(DeviceDetail).deviceId)
| extend operatingSystem = tostring(parse_json(DeviceDetail).operatingSystem)
Next let’s see how many devices are involved and add the following KQL line.
| summarize count() by DeviceName
That’s it for today, I hope you found this useful. I’m currently working on an early detection when logs start to unusually grow, this so that IT operations or Security teams can take an immediate action and prevent the daily cap being reached.
Bye
Alex