TL;DR
NSRLConvert can change the new SQLite NSRL hashsets into a plain list of MD5 hashes. The program is available here and can be used with the following syntax: NSRLConvert.exe {databasename} {outputfile}
Example:
NSRLConvert.exe RDS_2022.12.1_modern_minimal.db MD5.txt
In Detail
This last example will convert the Modern Minimal database to a list of MD5 hashes, 1 hash per line. If you need to add “MD5” to the top of the file, I suggest Pilotedit for editing large text files.
I’m sure that many of you are familiar with the NSRL hashsets. For those of you who are not, you probably should be! They are a set of hashes of “known good” files i.e. ones that are basically irrelevant to a forensic investigation. These include a lot of operating system files and common applications like Word and many more.
The NSRL hash sets were produced as very large text files that could be imported into pretty much any forensic tool to then ignore these irrelevant files. This speeds up processing, produces fewer files for review and other generally good things. However, as of version 3, these are now produced as SQLite databases. The current hash sets can be found here: https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl/nsrl-download/current-rds
There is nothing wrong with changing the format per se, but there are a number of tools which have not yet updated to support this format or tools which are no longer being updated, but are still useful in the state they are, and will therefore never support this file format. I was also hearing some horror stories of the length of time it was taking people to get these hashes into other tools. So…..why not write a small program to change them for us! I originally wrote this as a proof of concept to show that it could be done in a reasonable time frame, but since then there seems to have been a number of people who want to use it and therefore I decided to release it on here
For those who simply want a copy, it’s available here and can be used with the following syntax:
NSRLConvert.exe {databasename} {outputfile}
Example:
NSRLConvert.exe RDS_2022.12.1_modern_minimal.db MD5.txt
This last example will convert the Modern Minimal database to a list of MD5 hashes, 1 hash per line. If you need to add “MD5” to the top of the file, I suggest Pilotedit for editing large text files.
For those who are interested, I’ve also provided a breakdown of the code used, so that people can easily edit this to fit their own needs. Full code can be seen at the end of this post
The code
The most interesting part of the code is how to increase the speed whilst working with such a large database and for reference the Modern Minimal database is about 90GB.
I use SQLite for a lot of my programs to store data in the back end, and often for the user at the end, because it’s lightweight and because you can create In-Memory databases that can then be dumped to disk. This is one of the fastest ways to store query-able data, but with a 90GB file, this isn’t a viable option for everyone (Though I could do this on my forensic workstation!).
They also provide the native C code for their implementation, so you can compile it straight into a C program and I’m a massive fan of using C for anything that we want speed to be a priority. It should be noted, I’m not going to teach you C in this post. The code is here, but you should already be familiar with C if you want to understand it fully. I use Codeblocks as my IDE, I will be adding a blog on how to set this up at some point but until then there are a lot of resources on the Internet already.
Given that we cannot load the whole database to memory, how do we speed up the database process (apart from using C – only takes you so far!). After some searching and looking at the different ways, the answer seems to be playing with the options the database is opened with in the form of Pragmas. A full list of pragmas can be seen here and the ones used in the program are listed below:
const char* pragma [] = {"pragma journal_mode = WAL;",
"pragma synchronous = normal;",
"pragma temp_store = memory;",
"pragma mmap_size = 30000000000;"};
I’m not saying that these are the most optimal, but they certainly worked for me and I use the, for several projects. Some of the are not relevant to this one, but would be useful if we were writing data. The mmap_size relates to the amount of data stored in memory. A larger size obviously uses more memory, but means faster read times. The other Pragmas are all more relevant for writing data, which we don’t do in this project.
The code starts with some basic checks to make sure the right number of arguments are there. The validation is quite light here, so you could easily crash it by giving it invalid data, but then thats your own fault 🙂
The basic flow of the program is:
- Open the database
- Query the data we need
- Step through each record in our query
- Write MD5 has to our output txt file
- Close the database
Open the database
This code opens the database, argv[1] is the first argument passed to the program and should be the database file we want to open. If it’s not a valid database, the program will error and exit. The set_pragma function is setting the database to use the settings shown in the pragma variable above.
rc = sqlite3_open_v2(argv[1], &db, SQLITE_OPEN_READONLY,NULL);
if( rc ) {
fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));
return(0);
}
else {
fprintf(stderr, "Opened database successfully\n");
}
set_pragma(db);
Now we have the database open, we need to run our query. The RDS Modern Minimal has a number of tables, such as “FILE”, “MFG” and “OS” but it also has a view named “DISTINCT_HASH”. A view is a virtual table, so basically a query that displays as a table we can query. So we select all data from this view and although you can use the sqlite_exec function to do this, it’s often best to use the prepare then step as described here. This way, if you do make a mistake in the SQL statement, it will fail on the prepare and you will know it’s because the SQL is incorrect. It’s also useful for where you need to bind value, but we won’t be doing that here. So the following code prepare the SQL statement we are going to execute and then we will step through the results in the next section:
//run query
rc = sqlite3_prepare_v2(db, "SELECT * from DISTINCT_HASH;", -1, &stmt, 0);
if (rc != SQLITE_OK) {
fprintf(stderr, "Failed to fetch data: %s\n", sqlite3_errmsg(db));
sqlite3_close(db);
return 1;
}
Once we have prepared our query, we can step through each row and then we want to write the data out. We open a file (md5 below) which is the second argument passed to the program. If we wanted to add the text “MD5” to the first line of the text file, we can add this line after the line “md5 = fopen(argv[2],”w”);”:
fputs("MD5\n",md5);
The code after this runs a loop so that while we still have a row to process (rc==SQLITE_ROW), the loop will continue.
In each iteration of the loop, the program takes the 3rd column (SQLite numbers from 0), which is the MD5 column in the distinct_hash view, as text using the sqlite3_column_text function.
The first two columns are SHA256 and SHA1 so we could extract these if we wanted.
The program then writes this data to the output file and appends a “\n” character, which creates a newline in the text file. Without this, we would have one very long string of text!
Lastly, after the loop has completed, the program closes the text file and the database.
rc = sqlite3_step(stmt);
md5 = fopen(argv[2],"w");
while (rc==SQLITE_ROW)
{
const char* data= (const char*)sqlite3_column_text(stmt, 2);
fwrite(data, sizeof(char), strlen(data), md5);
fputs("\n",md5);
rc = sqlite3_step(stmt);
}
fclose(md5);
print_time("End Time is:");
sqlite3_close(db);
Hopefully this gives you a good understanding of the code so that you can modify it to suit your needs. The full code can be seen below, so enjoy!
Full code
Note, you will need the sqlite code for this, available here
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include "sqlite/sqlite3.h"
const char* pragma [] = {"pragma journal_mode = WAL;",
"pragma synchronous = normal;",
"pragma temp_store = memory;",
"pragma mmap_size = 30000000000;"};
const int pragma_size = 4;
void print_time(const char* message)
{
// `time_t` is an arithmetic time type
time_t now;
// Obtain current time
time(&now);
// Convert to local time format and print to stdout
printf("%s %s", message, ctime(&now));
}
void set_pragma(sqlite3* db)
{
for (int i=0;i<pragma_size;i++)
{
sqlite3_exec(db,pragma[i],0,0,0);
}
}
void print_usage()
{
printf("Usage:\n");
printf("\NSRLConvert.exe {input database} {output textfile}\n");
}
int main(int argc, char** argv)
{
sqlite3 *db;
sqlite3_stmt* stmt;
FILE* md5;
char *zErrMsg = 0;
int rc;
if (argc != 3)
{
printf("%i arguments given, 2 expected\n", argc-1);
print_usage();
return 0;
}
print_time("Start Time is:");
rc = sqlite3_open_v2(argv[1], &db, SQLITE_OPEN_READONLY,NULL);
if( rc ) {
fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));
return(0);
}
else {
fprintf(stderr, "Opened database successfully\n");
}
set_pragma(db);
//run query
rc = sqlite3_prepare_v2(db, "SELECT * from DISTINCT_HASH;", -1, &stmt, 0);
if (rc != SQLITE_OK) {
fprintf(stderr, "Failed to fetch data: %s\n", sqlite3_errmsg(db));
sqlite3_close(db);
return 1;
}
rc = sqlite3_step(stmt);
md5 = fopen(argv[2],"w");
while (rc==SQLITE_ROW)
{
const char* data= (const char*)sqlite3_column_text(stmt, 2);
fwrite(data, sizeof(char), strlen(data), md5);
fputs("\n",md5);
rc = sqlite3_step(stmt);
}
fclose(md5);
print_time("End Time is:");
sqlite3_close(db);
}
Using this command line:
NSRLConvert.exe RDS_2022.03.1_ios.db test.txt
I get an error: “Failed to fetch data: no such table: DISTINCT_HASH”.
Can’t see a DISTINCT_HASH view in the schema. Any idea what I might be doing wrong?
LikeLike
Please ignore my previous post. I see that it only works on the _minimal datasets. Apologies for the noise.
LikeLike
I actually didn’t realise there was a difference between the two databases. I’ll see if I can update it to do it for the IOS/Android ones as well. Watch this space!
LikeLike