When I scan executables on a Windows machine looking for malware or suspicious files, I often use the Reference Data Set of the National Software Reference Library to filter out known benign files.

nsrl.py is the program I wrote to do this. nsrl.py can read the Reference Data Set directly from the ZIP file provided by the NSRL, no need to unzip it.

Usage: nsrl.py [options] filemd5 [NSRL-file]

NSRL tool

Options:

–version show program’s version number and exit

-h, –help show this help message and exit

-s SEPARATOR, –separator=SEPARATOR

separator to use (default is ; )

-H HASH, –hash=HASH NSRL hash to use, options: SHA-1, MD5, CRC32 (default

MD5)

-f, –foundonly only report found hashes

-n, –notfoundonly only report missing hashes

-a, –allfinds report all matching hashes, not just first one

-q, –quiet do not produce console output

-o OUTPUT, –output=OUTPUT

output to file

-m, –man Print manual

Manual:

nsrl.py looks up a list of hashes in the NSRL database and reports the

results as a CSV file.

The program takes as input a list of hashes (a text file). By default,

the hash used for lookup in the NSRL database is MD5. You can use

option -H to select hash algorithm sha-1 or crc32. The list of hashes

is read into memory, and then the NSRL database is read and compared

with the list of hashes. If there is a match, a line is added to the

CSV report for this hash. The list of hashes is deduplicated before

matching occurs. So if a hash appears more than once in the list of

hashes, it is only matched once. If a hash has more than one entry in

the NSRL database, then only the first occurrence will be reported.

Unless option -a is used to report all matching entries of the same

hash. The first part of the CSV report contains all matching hashes,

and the second part all non-matching hashes (hashes that were not

found in the NSRL database). Use option -f to report only matching

hashes, and option -n to report only non-matching hashes.

The CSV file is outputted to console and written to a CSV file with

the same name has the list of hashes, but with a timestamp appended.

To prevent output to the console, use option -q. T choose the output

filename, use option -o. The separator used in the CSV file is ;. This

can be changed with option -s.

The second argument given to nsrl.py is the NSRL database. This can be

the NSRL database text file (NSRLFile.txt), the gzip compressed NSRL

database text file or the ZIP file containing the NSRL database text

file. I use the “reduced set” or minimal hashset (each hash appears

only once) found on http://www.nsrl.nist.gov/Downloads.htm. The second

argument can be omitted if a gzip compressed NSRL database text file

NSRLFile.txt.gz is stored in the same directory as nsrl.py.

nsrl_V0_0_1.zip (https)

MD5: 5063EEEF7345C65D012F65463754A97C

SHA256: ADD3E82EDABA7F956CDEBE93135096963B0B11BB48473EEC2C45FC21CFB32BAA