Tuesday, March 31, 2015

Needle in a haystack: searching from the Windows command line

A key part of security involves basic command line skills. Read on for some tips for command-line searches on Windows.
Part of network security involves fancy technology, specialized devices, and ever-advancing techniques. The crooks are constantly improving their craft, and so must the defenders. But an equally important part of security involves mundane and boring tasks, tasks such as looking through log files for indications that something undesirable happened or that someone has gained unauthorized access - i.e. Forensics 101.

There are a myriad tools available for searching, whether on Windows, Linux, or Mac. I am of the opinion that a security expert (or system administrator) needs to understand the command line and built-in tools first. There are times when you don't have the luxury of installing or using custom tools and have to make do with what comes on the operating system. If that system is Windows, you get Find and Findstr.

Find and Findstr are built-in, Windows command line utilities for searching text files for specific strings. They have different features, and different limitations.

Find has been around since the early days of Windows, and is meant for searching for a single string in one or more files. It does that job well. By itself, it will display every line that contains the target string. There are a handful of switches to make it behave differently:

find /c - instead of showing the matching lines, just give a count of how many lines match
find /v - display the lines that don't contain the target string
find /i - perform a case-insensitive search (ignore upper/lower case)

This is handy, but it's limited: you can only search for one word or phrase at a time, and you can only search for a literal string, not for a pattern.

Findstr began its life as a non-public utility written by a Windows developer for his own personal use, and showed up in the Windows 2000 resource kit as an add-on to Windows. It eventually made its way into the operating system with Windows XP and newer releases. In other words, every modern Windows system should have it.

Findstr can search for more than one string at a time, and it supports some types of regular expressions (a way of crafting a search pattern that is more broad than a literal string - for instance, "Thursday" is a literal string that means only that particular day, while ".*day" is a regular expression that will match anything ending in "day"). 

Despite its wider feature set, findstr has a few key limitations. It cannot handle lines of text greater than about 8kb (8192 characters), and it cannot handle text stored in Unicode format (a format for storing 16-bit characters. US English uses an alphabet with 26 characters; even with upper and lower case, numerals, punctuation, and a variety of special characters, the English character set is fewer than 255 characters, which will fit into an 8-bit form. Many languages use a character set with far more characters. Unicode enables this).

Now for some practical examples.

One common type of forensic search is to go through the Windows registry looking for anomalies. While this can be done from the Regedit graphical tool, it is far faster to export the registry to a text file and use the command line tools. To export the registry, we can use the built-in utility reg.exe, and select which registry "hive" we want:

reg export HKLM registry-HKLM-dump.txt
reg export HKCU registry-HKCU-dump.txt
reg export HKCR registry-HKCR-dump.txt
reg export HKCC registry-HKCC-dump.txt
reg export HKU  registry-HKU-dump.txt

In this first example, I will assume there is a malware file named either wordpad.exe or notepad.exe (yes, both are actually valid Windows text editors; this is just for the sake of illustration). I know the files exists on my hard drive, and suspect one or the other are installed as either services or scheduled tasks so they will run automatically.

The simplest search would be to use find separately for each filename:

find /i "wordpad.exe" registry-dump.txt
find /i "notepad.exe" registry-dump.txt

This gets the job done, but if I had more than a couple of filenames to hunt for, it would get cumbersome quickly. Findstr would be simpler for a list:

findstr /i "wordpad.exe notepad.exe" registry-dump.txt

Findstr has a handy /s flag that will search for the string in any file in the working directory or any subdirectories - it's a handy way to search for text in any file on disk, much like the Unix "find" command. Just be aware that since it is examining every file on disk, it may take a while

findstr /s /i "wordpad.exe notepad.exe" c:\*

Alas, exporting the registry to a text file using the built-in reg.exe utility saves the export in Unicode format, which fidstr does not support. I have to use a little wizardry now. I have two reasonably easy solutions: use find, reading a list of search terms from a file; or find a way to convert the registry dump into a format that findstr can handle. In both cases, I am going to put my search terms into a text file to make the next step easier. So I create a file called query.txt with the following (adding any additional words / filenames / phrases I wish to find):

wordpad.exe
notepad.exe

Windows very helpfully supports a "for" loop on the command line, meaning I can tell the system to run the same command over and over. In fact, Windows' for loop is quite powerful. A for loop is programmer-speak for "do something to every item in a list." For example, I might need to order lunch for each of my five children, or I might want to run a search for each word or phrase in a list. In this case, I am going to tell Windows to read the query file, and treat each line of the file as a separate search:

for /f %i in (query.txt) do find /i "%i" registry-dump.txt

The for loop becomes even more powerful if I have multiple files to search for a multiple target strings. Instead of looking in registry-dump.txt, I can search for any string in query.txt, in any file:

for /f %i in (query.txt) do find /i "%i" *

This works great for a simple string, but is not helpful if I want to use a regular expression (as I might want to do if I am only interested in notepad.exe at the end of a line, rather than buried in the middle of another string). Since findstr cannot properly search the Unicode-formatted registry export though, I have to get the registry export into a format that findstr can handle. The built-in "type" command displays the contents of a file in ASCII form, which findstr can then handle. The "/g" switch tells findstr to read the search patterns from the file instead of expecting search patterns on the command line:

type registry-dump.txt | findstr /i /g:"query.txt"

Finally, let's combine a few ideas into an ambitious script. In this example I want to search for a list of strings, to find out of the strings exist *anywhere* in any file on the entire disk. findstr /s will search through subdirectories, but will not give any results for Unicode or binary files. Find will properly search a binary file, but examines only one file at a time. The following command line will create a list of every file on disk, and examine every file to see if any of the target strings is found. It filters out any file where the target strings are not found, and creates a file "match.out" with a list of files that contain the target strings. The last command (findstr /v "^$") strips out any blank lines that would otherwise flood the output file.

for /f %f in ('dir /b /s c:\') do for /f %i in (c:\temp\query.txt) do find /c /i "%i" %f | find /v ": 0" | findstr /v "^$" >> match.out

There you have it. None of this is highly complicated, nor is it glamorous - but it's the type of fundamental technical skill critical to a successful security analyst.