Note

This software has been discontinued. Please use HexDive (it has all HAPI features plus lots more).

Also, check our other tools.

Old post

In one of my previous posts (Extracting Strings from PE sections), I demonstrated (ya… right, what a big word) how easy it is to extract sections of PE file into separate files using 7-Zip so that they can be later used for targeted strings analysis. As I mentioned, splitting a file into sections can be really useful as it helps to reduce a number of random string-alike non-strings we see in the output of ‘strings’ type of tools. Just to be on a secure side though – you may want to refer to my original post to find out more about caveats of such approach as there are cases when it may not be such a good idea.

There are many other techniques that can help in noise reduction and I am going to demonstrate one more today.

Analyzing Portable Executable (PE) files usually kicks off with running multiple static analysis tools including ‘strings’ and other tools that can help in determining what APIs are being used by a sample. One can use tools like PEDump, LordPE, PETools, Stud_PE, Dependency Walker, and lots of others that process sample’s import/export tables and help guessing what specific functionality is embedded in the sample.

Now, before we proceed further – three warnings here.

You should never, ever conclude your malware analysis with the output of ‘strings’, or PE parsing tools. This is a first step to shooting yourself in the foot. Always do code analysis. I will come back to this topic in the future in a separate post.

Ensure you actually know how these PE tools work. I know I don’t need to say this, but I have seen once a person using the Dependency Walker tool and analyzing malicious file by looking at the full list of functions exported from one of the Operating System DLLs. The DLL has not even been linked directly to a malware and was referenced only by a DLL that was directly linked to malicious .exe. In other words, the sample.exe was linked to kernel32.dll, kernel32.dll links to ntdll.dll. The guy was looking at the pane listing all functions exported by ntdll.dll. And while he was right that ntdll.dll does contain a lot of APIs used typically by malware, he was completely off the track! Oh, boy…

Obviously, APIs can often be found outside the import table since many packers, protectors, wrappers move them from import tables to internal data structures – they are often visible only when the memory of the protected process is dumped to a file; thus, none of typical PE parsing tools can ‘see’ them

So, now back to the original topic.

One simple noise reduction technique that is well known and used by many analysts is based on lists of patterns; these can be keywords, ANSI or Unicode strings, regular expressions, and practically speaking – any string of bytes that is unique and can be helpful in identifying interesting stuff inside the samples. This technique is used to some extent by projects like Yara, PEiD, and of course, it is extensively used by antivirus and IDS software. Having a good pattern list that identifies certain class of artifacts inside a file is a very attractive idea and I must confess that I am using such lists myself for a number of years.

After thinking one day on how to improve typical ‘strings’ analysis process I cooked a little program that focuses on one class of such patterns – APIs.

First, I built a list of over 50,000 thousands clean APIs, including:

Windows API

native APIs

kernel mode APIs

All of these are exported and imported by native Windows programs, drivers and DLLs. I combined them together into a large list. I then created a program that uses this list and searches for all of these inside the analyzed binary (note again: I run it most of the time on memory dumps, since many malicious samples come protected).

Yup. It’s that simple.

Now, you may be asking yourself – searching for 10-15 strings using a naive searching method (i.e. walk 10-15 times though the whole data searching for each string, or even using one regular expression) works well, but it is quite probable that for 50,000 and more strings we need to do better.

You are right.

This is a non-trivial problem, and naive algorithm doesn’t work here. Luckily, there are smart people out there who already figured it out. I looked around and researched various multi-pattern search algorithms – eventually deciding to use a very well-known multi-pattern algorithm – Aho-Corasick. It uses a very clever method of finding patterns by walking a trie anytime new character is fetched from the input, so it can search for a large set of patterns simultaneously (well, it’s more complicated than that, but let’s say it is very fast even for 50k patterns).

Since building the search trie that Aho-Corasick algorithm relies on takes quite some time, I precompiled it and included it directly into an executable. So, here it is – a simple tool that extracts known API names from a given binary.

I hope you will find it useful.

Usage:

hapi <filename>

Download

Example

Used on a random malicious sample, it produces the following results:

————————————————————–

HAPI v0.1 (c) Hexacorn 2012. All rights reserved.

Visit us at https://www.hexacorn.com

————————————————————–

DnsQuery_A

DnsRecordListFree

EnumDeviceDrivers

GetDeviceDriverBaseNameA

UuidToStringW

SRRemoveRestorePoint

SRSetRestorePointA

ConvertStringSidToSidA

GetAdaptersInfo

IsUserAdmin

InternetOpenUrlA

HttpOpenRequestA

InternetCloseHandle

InternetConnectA

InternetOpenA

InternetSetOptionA

InternetQueryOptionA

HttpQueryInfoA

HttpSendRequestA

InternetReadFile

HttpAddRequestHeadersA

memmove

memcmp

_itoa

malloc

free

memset

wcstombs

strtok

mbstowcs

strlen

_itow

srand

rand

memcpy

wcsrchr

tolower

towlower

atoi

strcpy

__dllonexit

_onexit

_XcptFilter

_initterm

_amsg_exit

exit

_adjust_fdiv

lstrlenA

lstrcpyA

lstrcatA

CreateFileA

DeviceIoControl

CloseHandle

GetVersionExA

CreateFileW

WriteFile

FlushFileBuffers

GetFileSize

VirtualAlloc

ReadFile

VirtualFree

CreateThread

GetModuleFileNameW

lstrcpyW

lstrlenW

OpenMutexW

WaitForSingleObject

WaitForMultipleObjects

GetExitCodeThread

SetFilePointer

SetEndOfFile

CreateMutexW

ReleaseMutex

GetModuleFileNameA

DisableThreadLibraryCalls

ExitProcess

LoadLibraryW

Sleep

GetLastError

InitializeCriticalSection

DeleteCriticalSection

EnterCriticalSection

lstrcatW

LeaveCriticalSection

GetCurrentThreadId

TerminateThread

GetSystemTimeAsFileTime

GetProcAddress

GetModuleHandleA

OpenProcess

RaiseException

VirtualAllocEx

WriteProcessMemory

CreateRemoteThread

VirtualFreeEx

CreateToolhelp32Snapshot

Process32First

lstrcmpiA

Process32Next

GetCurrentProcess

FreeLibrary

LoadLibraryA

lstrcmpiW

GetWindowsDirectoryA

GetVolumeInformationA

GetSystemTime

SystemTimeToFileTime

GetTickCount

GetLogicalDriveStringsW

GetDriveTypeW

DeleteFileW

CreateDirectoryW

LocalFree

CreateProcessW

OpenMutexA

OpenEventA

GetCurrentThread

SetFileTime

CreateEventW

TerminateProcess

DeleteFileA

WideCharToMultiByte

HeapAlloc

GetProcessHeap

HeapFree

SetFileAttributesW

InterlockedIncrement

InterlockedDecrement

GetVersion

InterlockedExchange

InterlockedCompareExchange

RtlUnwind

QueryPerformanceCounter

GetCurrentProcessId

UnhandledExceptionFilter

SetUnhandledExceptionFilter

CallNextHookEx

SetWindowsHookExA

PostMessageA

wsprintfA

CharUpperW

GetSystemMetrics

RegQueryValueExW

RegSetValueExW

RegFlushKey

RegCloseKey

RegOpenKeyExW

OpenProcessToken

LookupPrivilegeValueA

AdjustTokenPrivileges

GetTokenInformation

RegCreateKeyExW

SetEntriesInAclA

SetSecurityInfo

DuplicateTokenEx

OpenSCManagerA

OpenServiceA

ControlService

ChangeServiceConfigA

AllocateAndInitializeSid

CheckTokenMembership

FreeSid

InitializeSecurityDescriptor

SetSecurityDescriptorDacl

SetTokenInformation

GetLengthSid

SetThreadToken

IsValidSid

ConvertSidToStringSidW

RegDeleteValueW

RegQueryValueW

RegQueryInfoKeyW

RegEnumKeyExW

RegEnumValueW

RegDeleteKeyW

CloseServiceHandle

QueryServiceConfigA

QueryServiceStatusEx

StartServiceA

SHGetFolderPathW

SHGetFolderPathA

CoCreateInstance

CoInitialize

CoUninitialize

CoCreateGuid

CoTaskMemFree

_except_handler3

_local_unwind2

_CxxThrowException

DllCanUnloadNow

DllGetClassObject

Okay, it’s not random. It’s the same one I used to demonstrate Anti-forensics – live examples 🙂