Exploration and Instrumentation of Per-Process Windows Telemetry via ETW


  1. Motivation
  2. ETW Background
  3. Exploring ETW Providers
  4. Instrumenting Per-Process DNS and TCP Telemetry


Tracking anomalous behavior across disparate data sources to a “Patient Zero Process” on a single machine is often a challenging task for defenders. Imagine a scenario where defenders have found a pseudo-periodic beacon to an attacker controlled IP while looking at network logs. Identifying the compromised machine is relatively straightforward, however identifying the culprit process and gaining context around it can be a time consuming affair.

One potential solution to this challenge is to instrument per-process optics for network, inter-process, and other activity, allowing a defender to trace attack lifecycles even through evasion-heavy TTPs like process injection and encryption. This post will explore how a detections engineer can approach instrumentation to support such optics using Event Tracing for Windows (ETW), specifically:

  1. Exploring interesting ETW providers
  2. Ascertaining if there is useful signal from a given provider
  3. Rigging callback functions to handle the event and extract the signal from the noise

ETW Background

ETW is a native kernel-level tracing infrastructure originally provided by Microsoft for debugging purposes. It has begun to see re-purposing by the security community for its detailed tracing optics which are otherwise unavailable in classic Windows Events. ETW relies on three core components: Controllers, Providers, and Consumers.

For the purposes of this post, Controllers are responsible for starting and stopping trace sessions, Providers are responsible for enabling or disabling event data, and Consumers consume and interpret the events. More information on the structure of the ETW API can be found in the official docs here:

Historically, instrumenting ETW in order to rapidly perform detections research, investigate Providers of interest, and tailor callback functions to extract usable signal has been a source of friction.

To reduce this friction and focus on the goal of obtaining usable detections from ETW events, we will utilize a Microsoft tool called Message Analyzer to extract ETW provider information and a Python ETW wrapper from our friends at FireEye called Pywintrace to take a deeper dive into the events themselves.

Exploring ETW Providers

We start by asking a simple question: what ETW providers exist out of the box that could provide interesting data for engineering new detections?

To answer that, we will obtain a list of default ETW providers from Message Analyzer, loop through each one to gauge a baseline, non-stimulated data volume, then focus on the providers that generate data by default without additional tuning.

After downloading Message Analyzer from this link, click New Session > Live Trace > Add New Provider. You will be presented with a list of available System ETW Providers:

Instead of individually running traces from the Message Analyzer GUI, it is much more useful to copy/paste the thousands of rows of providers into a separate file, where they will appear as tab separated values. (The provider list could also have been obtained by running the query “logman query providers” on a command line, however extracting them directly from Message Analyzer allows us to immediately grab them in TSV form.)

We will now iterate through the list of ETW providers, listen to each for 10 seconds, and output how many events were captured. This will allow us to understand which providers generate events out of the box and which require additional instrumentation. Assuming Pywintrace is downloaded and installed, the code is as follows:

Observing the traces run for each provider, we notice that some providers are not capturing anything (at least without surgical stimulus) whereas others are actively capturing events. Note that both WMI and the LSA subsystem produce events which may be of future interest:

After the script has run through all 1000+ providers that come out of the box, we can utilize Pandas inside a Jupyter Notebook to do some quick exploratory analysis and find the top talkers.

First, we will load the output CSV into a Pandas dataframe and look at the first few rows using df.head().

Next we’ll remove the providers that are not outputting any events and sort the others by event count:

Great. Now that we know which providers are generating data and have a representative volume of each over a 10 second interval, we have a nice starting point from which to investigate what kind of signal the individual providers have to offer. We can also address the initial problem presented in the discussion – how to generate interesting per-process optics.

Instrumenting Per-Process DNS and TCP Telemetry

Investigating a compromised machine routinely leads to finding one or two nefarious processes that kicked off the remainder of the attack chain. The goal of a defender at that point is to understand context around that process across multiple dimensions – what did it spawn, where it connect, what handles did it open, and the like. One question that comes in handy here is: what DNS requests has a particular process made?

To facilitate answering this question as well as prototype some live per-process DNS telemetry, we can utilize the Microsoft-Windows-DNS-Client ETW provider to rig up some exploratory tooling.

To set up the capture, we’ll utilize code that is similar to the testing harness used above:

Note that when it’s time to handle an ETW event, the process() callback function will be used to perform additional work on the event. This can be used to parse, transform, or aggregate the data into more useful representations.

Let’s instrument the callback function to:

  1. Exclude ETW setup prologue events and localhost lookups
  2. Only return events querying for a specific site in order to focus on how the events are sequenced, in this case ‘www.twitter.com’

Running the program and navigating to “www.twitter.com” – we see the ETW provider yielding several different events of interest. It is left as an exercise to the reader to understand what the various event types represent, but a hint is that both network and cached DNS is involved.

Let us take a close look at the very first event that is fired, Event 3006. It is here that we finally get the connective tissue we want – both process information and DNS query information in the same place:

Modifying our callback function, we can put together some logic for real time monitoring of DNS queries by various PIDs, resolve those PIDs to executable paths using a WMI helper call, and then look for anomalies in the data.

The result is a streaming feed of DNS calls mapped to processes:

And with slight modifications to the ETW provider we are targeting, per-process TCP data indicating bytes sent and received (although it would be prudent to aggregate these with some bucketing function):

For a production environment, we can now utilize a collector to grab the logs and ship them to a centralized location or SIEM knowing that we have a good sense of the data volume involved for network<>process relationships.

Hopefully this has been a useful overview of the ETW investigation process and how it can be utilized to engineer useful detections. Some areas of potential future research:

  • What other ETW providers provide useful info out of the box?
  • How can we automate stimulating ETW providers to see if they generate data given certain conditions?
  • What kind of correlation can be performed amongst the various providers?
  • How can we use ETW to enrich classic Windows Events?

Intuitive Detections Research With Graph Analytics and Neo4J

By Nik Seetharaman


I explore using graph analytics tools to visualize simulated attacks during the course of detections research to gain an intuition of what an attack is doing across multiple axes. I also discuss streamlining detections research workflows by automating Sysmon setup, teardown, and log export.


On the heels of DEFCON, Black Hat, and the overall 2018 security research season, there are a number of new offensive techniques I’m psyched to begin researching through the lens of the Blue Team, especially some of SpecterOps’ and Will Schroeder’s new C#-based tooling. 

One of the challenges in doing Blue Team and Detections research is being able to easily digest relationships among various entities on a system during the execution of a particular attack. Event Viewer isn’t sufficient to gain such contextual understanding, and tools like Splunk and ELK aren’t much better when trying to execute link based analysis. Additionally, it’s a pain to time-bound logs to the exact events you care about after running a simulated attack. Ideally this constraining would happen upstream of the analysis such that a high percentage of the events we receive are directly related to the attack we are attempting to research.

In this post I’ll walk through how to use graph analytics with Neo4J to visualize what happens during execution of an attack, as well as how I think about the overall workflow of:

  • Firing the attack while only capturing the events we want
  • Extracting objects and relationships out of the event logs
  • Loading the objects and relationships into Ne04J to visualize them

The ultimate goal is to be able to execute a desired simulated attack (i.e. one of the many in Red Canary’s Atomic Red Team) and then quickly and intuitively evaluate what the footprint of that attack is along various dimensions.


Developing the graph-based research workflow is reliant on the following:

1. A VM on which we will be running our atomic attack test.
2. An analysis machine separate from the victim VM.
3. Sysmon with a modified version of Swift on Security’s configuration file.
4. Batch scripts to automate Sysmon configuration, log clearing, and exporting of logs.
5. Python to parse the exported Sysmon logs and generate queries that we’ll need to use in Neo4J.
6. Neo4J Desktop and Neo4J Browser to visualize our resulting graphs.

Automating Sysmon Setup and Teardown

We can obtain a victim VM from Chris Long’s Detection Lab project and then install Sysmon with the config located here. 

We can also download additional configuration files for more tailored research from Olaf Hartong’s modular Sysmon project.

We can then utilize a batch script on the victim VM to automate the process of setting up Sysmon after specifying the configuration file that we want depending on our research goals. The modified Swift on Security configuration provided above is a good start. After we run the attack that we want to research and capture the relevant logs, another batch file can be used to tear down that configuration and return Sysmon to a baseline state.

The reason we want to do this is we may want to use several versions of a Sysmon configuration in sequence to evaluate the footprint of a given attack under that specific set of Sysmon rules. Some of those rulsets may cause high system load, and in order to preserve VM and host system performance, we’ll want to revert Sysmon to a base state of not collecting anything when we’re not actively running attacks.

That no-capture configuration will look like this:

The following batch commands can then be used for the pre-attack setup:

In the first command, we use the Windows Event command line utility to clear any existing logs. Line 2 sets up a variable to capture the first user specified command line argument for the Sysmon configuration file we want to use. Line 3 executes Sysmon with that configuration.

After we run the atomic attack that we intend to research, we use the next set of batch commands in order to export logs, tear down and revert Sysmon to a no-capture baseline:

Line 1 again uses the Windows Event command line utility to query Sysmon logs and output an XML file that we will use later on. Line 2 reverts Sysmon to a no-capture state by using a special configuration file called “nocapture.xml” and finally line 3 clears the logs.

(Note – it is feasible that instead of reverting to a no-capture state, we could simply uninstall Sysmon via running the command Sysmon -u. This however can take longer and induce latency into your workflow.)

We’ll now run the setup script, execute Casey Smith’s remote COM scriptlet execution attack (MITRE T1117) as our simulated attack, and then tear down Sysmon:

We now have the log data we want to visualize in the form of an XML file. The next step will be to extract entities and relationships between them in order to populate the graph.

Modeling Objects and Relationships

Departure from log analysis towards graph analysis requires that we take the event logs in the XML output and extract entities and relationships from it that can then be displayed on the graph in the form of nodes and edges (links) between them. To do this, we’ll use a Python script to turn each Sysmon event into a Neo4J Cypher command that expresses the relationships between entities in that Sysmon event.

Cypher is Neo4J’s graph query language that can create graph objects and describe relationships between those objects using ASCII art syntax. For example, to express that calc.exe is a child process of cmd.exe, we would say:

This would show up in the Neo4J Graph as:

To do this for a Sysmon Process Create event, we would write the following to translate from the XML to Cypher:

In this case, we are using the command MERGE instead of CREATE so that Neo4J will bind relationships to objects if they already exist, or create objects and then bind relationships to them if they do not.

Each Sysmon Event ID will require merging potentially new types of objects (files for example) and then modeling the relationship between the Process GUID and the resultant object.

Let’s write XML to Cypher translations for most major Sysmon event types that we’d care about:
– Process Creates
– Network Connections
– Image Loads
– Process Accesses
– File Creates
– Registry Value Sets

We’ll also have a small helper function called getFilename to parse out a display-friendly name from the full path shown by Sysmon.


Parsing the XML Logfile

After creating our Cypher translation functions, we’re ready to parse the XML logfile containing the event data for Casey Smith’s Squiblydoo attack.

In the above code, we create a dictionary-like object using the collections.defaultdict() class to hold the Sysmon Event Data we need to extract entities and links. We then select the proper Cypher translation function depending on the Event ID and in the case of Process Access events, suppress anything with MsMpEng.exe (a Windows Defender process) and because I’m using VirtualBox, VBoxService.exe (VirtualBox’s Guest Additions service).

Our resulting Cypher queries look like so:


Generating the Graph

After downloading Neo4J and opening the Desktop client, we are presented with something similar to the following:

Click on New Graph and select “Create Local Graph.” Enter a name for the graph and set a password. Once the graph card populates with the new graph, click “Start.” Then click “Manage” and in the resulting window “Open Browser.”

This will bring you to the meat of the operation, the Neo4J Graph Browser:

Cypher commands may be run from the command bar at the top, with results of any commands / queries sequentially appearing in the results area below. New results will show up at the top of the result stack. To clear the stack, you can type “:clear” in the command bar.

To generate our graph, we’ll keep it simple and copy / paste our generated Cypher code into the command bar, and then hit CTRL+Enter to execute it. Each query will then run in sequence.

To view the resulting graph, we can click on the button labeled “*(9)” on the left under “Node Labels.”

The above graph serves as an intuitive visual reference about the different interactions between the objects involved when we ran our Squiblydoo attack, namely various Process Accesses, cmd.exe spawning regsvr32.exe and regsvr32 subsequently spawning calc.exe. Finally we have the fairly obvious network connection from regsvr32.exe to the external IP address where it pulled down the remote scriptlet from Github.

The relationship type label on each link lets us know what the nature of the relationship between nodes is. In the case of this attack, we observe types of:

  • Accessed
  • Spawned
  • ConnectedTo

Note that you can constrain what is visualized on the graph by selecting Node Labels or a certain Relationship Type on the left. The result will be generated in a new panel. So if I only wanted to see Process Spawns, I would select the “Spawned” relationship type and only be presented with the cmd->regsvr32->calc flow.

To clear your work and start from scratch (i.e. to try a new cypher import), clear the current database by typing

Then type

It is left as an exercise to the reader to:

  • Implement the automation discussed
  • Run several other atomic red team tests and capture the event logs via Sysmon
  • Export the logs to XML and translate them to Cypher
  • Load the translated Cypher into Neo4J to visualize each attack

In the coming weeks I’ll be working on a follow on to this post focused on utilizing graph analysis to research the nuances of unmanaged Powershell.


Detecting CMSTP-Enabled Code Execution and UAC Bypass With Sysmon.

By Nik Seetharaman


An old Microsoft tool is able to be weaponized in several ways and reportedly used by at least one nation state actor. I explore ways for defenders to detect it’s abuse.


As I was perusing MITRE’s ATT&CK framework the other day to learn about techniques I’m less familiar with, I came across the ambiguous-sounding CMSTP (T1191 in ATT&CK) which MITRE states can be used for UAC Bypass and code execution. Being that it’s also allegedly been used by a nation-state actor recently, I wanted to research potential detection strategies and wrap my head around possible blind spots.

Initial research yielded that CMSTP is an old remote access configuration tool that comes with a config wizard called the Config Manager Admin Kit. This wizard spits out, among other things, an INF configuration file that’s able to be weaponized along various dimensions.

Invoking the weaponized INF with CMSTP results in the ability to run both arbitrary scripts (local and remote) and bypass User Account Control to elevate security contexts from medium to high integrity.

Being that CMSTP is a legitimate signed Microsoft binary living in the System32 directory, the implication is an attacker could land on a system, utilize CMSTP to bypass poorly configured application whitelisting, and obtain elevated command shells or pull down arbitrary code remotely via WEBDAV.

For more background reading, Oddvar Moe wrote up some great research into how CMSTP works, which gave me a good baseline to build on.

This post will explore various considerations in trying to detect CMSTP exploitation along these various axes using Windows Sysinternals’ Sysmon tool configured with Swift on Security’s baseline configuration, found here.

CMSTP Abuse Vectors

I investigated detection strategies for three different categories of CMSTP abuse, all of which involve arbitrary code execution and two of which allow for code execution with UAC bypass:

  1. Invoking weaponized .INF setup files to run local or remote .SCT scripts containing malicious VBScript or JScript code.
  2. Invoking weaponized .INF files to run local executables while enabling UAC bypass / elevating integrity levels, allowing for spinup of elevated command shells.
  3. Direct utilization of the COM interfaces that CMSTP hooks into allowing for (slightly) stealthier UAC bypass.

Let’s dive into the detections considerations for each of these methods.

Method 1 – INF-SCT Launch

Bohops wrote a great article with some background and context around INF-SCT fetch and execute techniques here.

The gist is that the ‘UnRegisterOCXSection’ in the malicious INF file can be modified to invoke scrobj.dll and have it execute either a local or remotely fetched .SCT script containing malicious VBScript or JScript code.

Let’s take a look at an example (T1191.inf) pulled from the Atomic Red Team repo that maps to the CMSTP Mitre Technique (T1191):

Executing the command “cmstp.exe /s t1191.inf” will pull down and execute the SCT script located at https://raw.githubusercontent.com/redcanaryco/atomic-red-team/master/atomics/T1191/T1191.sct

That script (spawning what looks to be an Advanced Persistent Calculator) looks like so:

Digging into the Sysmon logs in Event Viewer after running the command, we see several Sysmon events generated. Notice that the spawned calc.exe has c:\windows\system32\cmstp.exe as the ParentImage and that the IntegrityLevel is Medium, i.e. no integrity elevation occurred.

Let’s now take a look at the Sysmon 3 Network Connections. One of the connections looks to be to localhost over a high number port. The other shows cmstp.exe as the Image calling out to (Github) over 443.

It follows then, that potential Sysmon detection rules for Method 1 could be:

  • Sysmon Event 1 where ParentImage contains cmstp.exe
  • Sysmon Event 3 where Image contains cmstp.exe and DestinationIP is external

Method 2 – UAC Bypass via INF RunPreSetupCommandSection

As Odvar Moe found in his research, it turns out that the RegisterOCXSection of the INF file is not the only section susceptible to weaponization. Looking at a different INF file generated by the Connection Manager Admin Kit, it’s possible to insert arbitrary binaries for execution under the RunPreSetupCommandSection. In this case, we’re spawning a command shell and then subsequently killing the cmstp executable.

Getting this method to work on the command line is slightly different than in Method 1, requiring some new options, making sure “All Users” is checked in a dialog box that pops up, and hitting OK.

Once done, we have our command shell. Notice that unlike the previous method, executables run in this fashion elevate their security context with no notice to the user, resulting in UAC Bypass. We’ll look at a stealthier way to do this in Method 3 that doesn’t involve a popup.

Note the Sysmon 12 and Sysmon 13 registry value add and value set events:

Sysmon 12 – Registry Object Added

Sysmon 13 – Registry Value Set

Dllhost.exe is creating the object cmmgr32.exe in the Sysmon 12 then setting the ProfileInstallPath value to C:\ProgramData\Microsoft\Network\Connections\Cm in the subsequent Sysmon 13.

Let’s take a look at the Sysmon 1 event where the cmd.exe was actually spawned:

Unlike Method 1 where cmstp.exe was the ParentImage and the target binary was the child, here Dllhost.exe is the parent.

We see in the ParentCommandLine field that Dllhost.exe utilizes a ProcessID option with what appears to be some kind of GUID. To understand what that GUID is doing there, we’re going to rerun the attack but this time using a modified Sysmon configuration that allows us to obtain Sysmon Event 10s (Process Access).

To limit the collection aperture for the Event 10s and avoid grinding the system to a halt, we’re going to follow Tim Burrell’s great writeup here and set up Sysmon such that we’re pulling only those Sysmon 10 events requesting highly privileged levels of process access or containing an “unknown” string in the CallTrace:

We’ll need to let Sysmon know to use the updated configuration by running:

sysmon -c <modified_config.xml>

Re-running the attack, we see several additional Sysmon 10 events. One of them in particular, where Dllhost.exe accesses the TargetImage cmd.exe, is interesting.

Note the CallTrace data. One of the DLLs called was cmlua.dll – which @hFireF0X has called out as containing an autoelevated COM interface called CMLUAUTIL. We’ll see CMLUAUTIL again when we get to Method 3. For now, let’s recap our potential detections for Method 2:

  • Sysmon 1 where ParentImage contains dllhost.exe and Image contains cmd.exe (a strategy which may produce lots of noise and not bracket you to a CMSTP exploit)
  • Sysmon 10 where CallTrace contains cmlua.dll
  • Sysmon 12 or 13 where TargetObject contains cmmgr32.exe

Method 3 – UAC Bypass via Direct Utilization of COM Interfaces.

As @hFireF0X stated in his tweet, cmlua.dll references the autoelevated COM interfaces CMLUAUTIL and CMSTPLUA via cmlua.dll and cmstplua.dll respectively. In his UAC Bypass project UACME (https://github.com/hfiref0x/UACME) there are several methods enumerated to execute bypass, however #41 contains a proof of concept to execute the same attack we saw in Method 2, except instead of dealing with the cmstp.exe executable, it’s popup dialog, and relying on the DLLs to interface with the COM interfaces, we interface with them directly.

What’s the potential impact on our Sysmon visibility if one were to utilize this method?

To execute this UACME-powered attack as of July 2018, we’ll need to grab a previous commit of the UACME repo with the “Compiled” and “Source” directories still in place (he’s removed the executable we need for whatever reason – so grab a commit from May or June of 2018). Under the Compiled directory, let’s run “Akagi32.exe 41.”

If we navigate back to the Sysmon 10 event that we analyzed in Method 2 where Dllhost.exe accessed cmd.exe and look at the CallTrace, there is NO mention of cmlua.dll. Also note that there are NO Sysmon 12 or 13 events. This indicates that looking for cmlua.dll or registry adds / mods is potentially brittle:

No cmlua.dll to be found…

Let’s revisit the Sysmon 1 event where dllhost.exe spawned cmd.exe. It turns out that the GUID we see in the ParentCommandLine field is actually the Class ID for the COM object we’re hooking into, in this case autoelevate-capable CMSTPLUA.

A potential way forward then for detecting both Method 2 and 3 is to alert on dllhost.exe in the ParentCommandLine along with the GUID of CMSTPLUA:

  • Sysmon 1 where ParentCommandLine contains dllhost.exe and contains GUID for CMSTPLUA COM object (3E5FC7F9-9A51-4367-9063-A120244FBEC7)

I’ll need to do further research to figure out how this might be further obfuscated by an adversary but it could be a good base.

To summarize, CMSTP and it’s dependencies are capable of facilitating a few different methods of code execution and UAC bypass, each with it’s own detection nuances and footprint. Note that before deploying any of these detections to production, it’s important to baseline what’s happening on your network and develop a hypothesis around why implementing any of these will produce high signal / low false positive rate for you.