Importing NSRL V3 hashsets into legacy tools

TL;DR

NSRLConvert can change the new SQLite NSRL hashsets into a plain list of MD5 hashes. The program is available here and can be used with the following syntax: NSRLConvert.exe {databasename} {outputfile}
Example:
NSRLConvert.exe RDS_2022.12.1_modern_minimal.db MD5.txt

In Detail

This last example will convert the Modern Minimal database to a list of MD5 hashes, 1 hash per line. If you need to add “MD5” to the top of the file, I suggest Pilotedit for editing large text files.

I’m sure that many of you are familiar with the NSRL hashsets. For those of you who are not, you probably should be! They are a set of hashes of “known good” files i.e. ones that are basically irrelevant to a forensic investigation. These include a lot of operating system files and common applications like Word and many more.

The NSRL hash sets were produced as very large text files that could be imported into pretty much any forensic tool to then ignore these irrelevant files. This speeds up processing, produces fewer files for review and other generally good things. However, as of version 3, these are now produced as SQLite databases. The current hash sets can be found here: https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl/nsrl-download/current-rds

There is nothing wrong with changing the format per se, but there are a number of tools which have not yet updated to support this format or tools which are no longer being updated, but are still useful in the state they are, and will therefore never support this file format. I was also hearing some horror stories of the length of time it was taking people to get these hashes into other tools. So…..why not write a small program to change them for us! I originally wrote this as a proof of concept to show that it could be done in a reasonable time frame, but since then there seems to have been a number of people who want to use it and therefore I decided to release it on here

For those who simply want a copy, it’s available here and can be used with the following syntax:
NSRLConvert.exe {databasename} {outputfile}

Example:
NSRLConvert.exe RDS_2022.12.1_modern_minimal.db MD5.txt

This last example will convert the Modern Minimal database to a list of MD5 hashes, 1 hash per line. If you need to add “MD5” to the top of the file, I suggest Pilotedit for editing large text files.

For those who are interested, I’ve also provided a breakdown of the code used, so that people can easily edit this to fit their own needs. Full code can be seen at the end of this post

The code

The most interesting part of the code is how to increase the speed whilst working with such a large database and for reference the Modern Minimal database is about 90GB.
I use SQLite for a lot of my programs to store data in the back end, and often for the user at the end, because it’s lightweight and because you can create In-Memory databases that can then be dumped to disk. This is one of the fastest ways to store query-able data, but with a 90GB file, this isn’t a viable option for everyone (Though I could do this on my forensic workstation!).

They also provide the native C code for their implementation, so you can compile it straight into a C program and I’m a massive fan of using C for anything that we want speed to be a priority. It should be noted, I’m not going to teach you C in this post. The code is here, but you should already be familiar with C if you want to understand it fully. I use Codeblocks as my IDE, I will be adding a blog on how to set this up at some point but until then there are a lot of resources on the Internet already.

Given that we cannot load the whole database to memory, how do we speed up the database process (apart from using C – only takes you so far!). After some searching and looking at the different ways, the answer seems to be playing with the options the database is opened with in the form of Pragmas. A full list of pragmas can be seen here and the ones used in the program are listed below:

const char* pragma [] = {"pragma journal_mode = WAL;",
                "pragma synchronous = normal;",
                "pragma temp_store = memory;",
                "pragma mmap_size = 30000000000;"};

I’m not saying that these are the most optimal, but they certainly worked for me and I use the, for several projects. Some of the are not relevant to this one, but would be useful if we were writing data. The mmap_size relates to the amount of data stored in memory. A larger size obviously uses more memory, but means faster read times. The other Pragmas are all more relevant for writing data, which we don’t do in this project.

The code starts with some basic checks to make sure the right number of arguments are there. The validation is quite light here, so you could easily crash it by giving it invalid data, but then thats your own fault 🙂

The basic flow of the program is:

  • Open the database
  • Query the data we need
  • Step through each record in our query
  • Write MD5 has to our output txt file
  • Close the database

Open the database

This code opens the database, argv[1] is the first argument passed to the program and should be the database file we want to open. If it’s not a valid database, the program will error and exit. The set_pragma function is setting the database to use the settings shown in the pragma variable above.

    rc = sqlite3_open_v2(argv[1], &db, SQLITE_OPEN_READONLY,NULL);
    if( rc ) {
      fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));
      return(0);
    }
    else {
      fprintf(stderr, "Opened database successfully\n");
    }
    set_pragma(db);

Now we have the database open, we need to run our query. The RDS Modern Minimal has a number of tables, such as “FILE”, “MFG” and “OS” but it also has a view named “DISTINCT_HASH”. A view is a virtual table, so basically a query that displays as a table we can query. So we select all data from this view and although you can use the sqlite_exec function to do this, it’s often best to use the prepare then step as described here. This way, if you do make a mistake in the SQL statement, it will fail on the prepare and you will know it’s because the SQL is incorrect. It’s also useful for where you need to bind value, but we won’t be doing that here. So the following code prepare the SQL statement we are going to execute and then we will step through the results in the next section:

    //run query
    rc = sqlite3_prepare_v2(db, "SELECT * from DISTINCT_HASH;", -1, &stmt, 0);
    if (rc != SQLITE_OK) {

        fprintf(stderr, "Failed to fetch data: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);

        return 1;
    }

Once we have prepared our query, we can step through each row and then we want to write the data out. We open a file (md5 below) which is the second argument passed to the program. If we wanted to add the text “MD5” to the first line of the text file, we can add this line after the line “md5 = fopen(argv[2],”w”);”:

fputs("MD5\n",md5);

The code after this runs a loop so that while we still have a row to process (rc==SQLITE_ROW), the loop will continue.
In each iteration of the loop, the program takes the 3rd column (SQLite numbers from 0), which is the MD5 column in the distinct_hash view, as text using the sqlite3_column_text function.
The first two columns are SHA256 and SHA1 so we could extract these if we wanted.
The program then writes this data to the output file and appends a “\n” character, which creates a newline in the text file. Without this, we would have one very long string of text!

Lastly, after the loop has completed, the program closes the text file and the database.

    rc = sqlite3_step(stmt);
    md5 = fopen(argv[2],"w");
    while (rc==SQLITE_ROW)
    {
        const char* data=  (const char*)sqlite3_column_text(stmt, 2);
        fwrite(data, sizeof(char), strlen(data), md5);
        fputs("\n",md5);
        rc = sqlite3_step(stmt);
    }
    fclose(md5);
    print_time("End Time is:");
    sqlite3_close(db);

Hopefully this gives you a good understanding of the code so that you can modify it to suit your needs. The full code can be seen below, so enjoy!

Full code

Note, you will need the sqlite code for this, available here

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include "sqlite/sqlite3.h"

const char* pragma [] = {"pragma journal_mode = WAL;",
                "pragma synchronous = normal;",
                "pragma temp_store = memory;",
                "pragma mmap_size = 30000000000;"};

const int pragma_size = 4;


void print_time(const char* message)
{
    // `time_t` is an arithmetic time type
    time_t now;

    // Obtain current time
    time(&now);

    // Convert to local time format and print to stdout
    printf("%s %s", message, ctime(&now));

}

void set_pragma(sqlite3* db)
{
    for (int i=0;i<pragma_size;i++)
    {
        sqlite3_exec(db,pragma[i],0,0,0);
    }
}

void print_usage()
{
    printf("Usage:\n");
    printf("\NSRLConvert.exe {input database} {output textfile}\n");
}

int main(int argc, char** argv)
{
    sqlite3 *db;
    sqlite3_stmt* stmt;
    FILE* md5;
    char *zErrMsg = 0;
    int rc;

    if (argc != 3)
    {
        printf("%i arguments given, 2 expected\n", argc-1);
        print_usage();
        return 0;
    }

    print_time("Start Time is:");
    rc = sqlite3_open_v2(argv[1], &db, SQLITE_OPEN_READONLY,NULL);
    if( rc ) {
      fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));
      return(0);
    }
    else {
      fprintf(stderr, "Opened database successfully\n");
    }
    set_pragma(db);
    //run query
    rc = sqlite3_prepare_v2(db, "SELECT * from DISTINCT_HASH;", -1, &stmt, 0);
    if (rc != SQLITE_OK) {

        fprintf(stderr, "Failed to fetch data: %s\n", sqlite3_errmsg(db));
        sqlite3_close(db);

        return 1;
    }
    rc = sqlite3_step(stmt);
    md5 = fopen(argv[2],"w");
    while (rc==SQLITE_ROW)
    {
        const char* data=  (const char*)sqlite3_column_text(stmt, 2);
        fwrite(data, sizeof(char), strlen(data), md5);
        fputs("\n",md5);
        rc = sqlite3_step(stmt);
    }
    fclose(md5);
    print_time("End Time is:");
    sqlite3_close(db);
}

Decrypting Mega Preferences (Part 2)

The first post seemed to gain a lot of attention from people, with someone from a Police force contacting me to help with one of their cases, so I’ve spent a little time making the script into more of a finished product than a POC. The new script can be found here and is now completely python, though you will need to install the pycryptodomex library. Dont worry, I’ll cover everything needed in this post, including how to run it on air gapped machines, as a lot of Digital Investigations takes place on these types of machines

In the previous post, I took the Java code and made a small .jar ‘compiled’ executable that was called by a python script to decode data. This was the simplest way to do it, but had several limitations and I found examples where fields were not encrypted or had NULL values and this caused issues.

The TLDR; is that the new version of the script can be downloaded from here. Usage of the script is:

python megapreferencedecryptor1.0 -i {inputdatabase} -o {outputname}

Improving the Script

So, in order to change the Java code to Python, I needed to identify what algorithm was being used. This starts with an examination of Java “aes_decrypt” function code:

public static byte[] aes_decrypt(byte[] bArr, byte[] bArr2) throws Exception {
        SecretKeySpec secretKeySpec = new SecretKeySpec(bArr, "AES");
        Cipher instance = Cipher.getInstance("AES");
        instance.init(2, secretKeySpec);
        return instance.doFinal(bArr2);
    }

Documentation of the Cipher.Init function, instance.init in the code, can be located here. One of these seems to match our implementation that takes two values, opmode and Key

Values for opmode as shown here, but the ones we are interested in are:

  • DECRYPT_MODE == 2
  • ENCRYPT_MODE == 1

I’m not sure whether the values have been hardcoded into the Mega App or if this is just how the decompilation process has interpreted it, but either way it doesn’t take us any further.

Looking at the creation of the secretKeySpec object we can see the string “AES” as a parameter as well as a byte array, which was the key we had previously located. However there are a number of different AES methods that can be used by Java and in order to decode the data in Python, we need to know exactly which one. As nothing else seems to be specified, we can assume there is a default method.

This StackOverflow post states that default method used in Java is AES/ECB/PKCS5Padding. A description of the ECB method of AES can be seen here

PKCS5Padding is described here: here. But basically it pads to an 8 byte boundary, so the size of the data is exactly divisible by 8, using a byte padding that has a value equal to the number of bytes padding.

If we had a block of 5 bytes, then 3 bytes of pading would be required, so the padding would by 0x030303. Similarly if only a single byte of padding was required the value 0x01 would be used.

The commonly found function for unpadding (I only need to unpad as I am decrypting not encrypting) looks like this

unpad = lambda s : s[0:-ord(s[-1])]

Note in Python3, which my script is written in as all new code should be, the “ord” function needs to be removed for it to work. So now we have a function to unpack the data, but we still need to decrypt it.

The code for decrypting the data using AES is suprisingly simple using the pycrypto function:

import base64
from Crypto.Cipher import AES

BS = 16

def unpad(ct):
    return ct[:-ct[-1]]

secretKey = "android_idfkvn8 w4y*(NC$G*(G($*GR*(#)*huio4h389$G"[0:32]
cipher = "IdAI0AprbXDgp5RH4ByJz/hxDJjUGPi8fLOI2lVMJgs="

private_key = secretKey
enc = base64.b64decode(cipher)
cipher = AES.new(bytes(private_key, 'utf-8'), AES.MODE_ECB)

print(unpad(cipher.decrypt(enc)).decode('utf-8'))

This code prints out the correct data for that string, so now we need to make it decode all the fields in our database. In the first POC script, the script created a new table within the original SQLite database with “dec” appended and this is less than ideal.

So in the more polished script, I take 2 arguments; the input database and an output name. This the gives a completely decrypted database without altering the original. So the actual decryption functions are shown below.

def unpad(ct):
    return ct[:-ct[-1]]

def decrypt(cipher):
    secretKey = "android_idfkvn8 w4y*(NC$G*(G($*GR*(#)*huio4h389$G"[0:32]
    private_key = secretKey
    enc = base64.b64decode(cipher)
    cipher = AES.new(bytes(private_key, 'utf-8'), AES.MODE_ECB)
    return(unpad(cipher.decrypt(enc)).decode('utf-8'))

So that takes care of the decryption section, but I need to read the database and extract all the tables, decrypt them and write them to a new database. To do this, I created a number of functions. The function below creates an

def generate_output_database(inputDB, outputDB):
    input_cursor = inputDB.cursor()
    output_cursor = outputDB.cursor()
    for row in input_cursor.execute("Select sql from SQLITE_MASTER where type = \"table\";"):
        output_cursor.execute(row[0])
    outputDB.commit()

This function is reading all the records from the SQLITE_MASTER table, which stores all the details of tables in the database, and executing the SQL command to create identical tables in the new database. Now all that is required is some code to decrypt the fields and write them to the new database.

def decrypt_tuple(field_data):
    if isinstance(field_data, str):
        if field_data == "":
            return "";
        else:
            return decrypt(field_data.replace("\n",""))
    elif isinstance(field_data, type(None)):
        return "NULL"
    elif isinstance(field_data, int):
        return str(field_data)
    else:
        return "Unknown"

def decrypt_table(inputDB, outputDB, table_name):
    input_cursor = inputDB.cursor()
    output_cursor = outputDB.cursor()
    for row in input_cursor.execute("Select * from %s;" % table_name):
        if table_name in ignore_tables:
            break
        record_data = []
        for column in row:
            record_data.append(decrypt_tuple(column))
        write_record(outputDB, table_name, record_data)
        

def decrypt_database(inputDB, outputDB):
    input_cursor = inputDB.cursor()
    output_cursor = outputDB.cursor()
    for row in input_cursor.execute("Select tbl_name from SQLITE_MASTER where type = \"table\";"):
        print("Extracting records from " + row[0])
        decrypt_table(inputDB, outputDB,row[0])

These can be summarised as the decrypt_database function gets the name of every table in the database and runs the decrypt table for each of them.

The decrypt_table function gets every row in the table and then passes every field to the decrypt_tuple function and writes the results into the output table.

The decrypt_tuple function checks the data type that it has received and deals with it appropriately. Where it is an encrypted Base64 encoded string, it decrypts it. If it is an empty value (NULL) then it returns the string “NULL” and if its a integer it returns a string of that integer.

That’s the basic overview of the script, the rest of the code is error checking and parsing arguments for the script. These can be seen in the full version.

Lastly, the pycryptodomex library is required to run the script. This is very simple to do on a machine with an Internet connection using the pip installer

$ pip install pycryptodomex
Collecting pycryptodomex
  Downloading pycryptodomex-3.14.1-cp35-abi3-manylinux2010_x86_64.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 20.6 MB/s 
Installing collected packages: pycryptodomex
Successfully installed pycryptodomex-3.14.1

This is tricky to do without an Internet connection and the easiest way is to install the same version of python on an Internet connected machine as on your offline machine and then use the following command to download the Wheel (.whl) file for the library

pip download --only-binary=:all: pycryptodomex

Once the Wheel file has been downloaded, it can be transferred to the offline machine and installed using the following command

pip install pycryptodomex-3.14.1-cp35-abi3-win_amd64.whl

That’s it, hopefully this has been useful and the script works for you.

Decrypting Mega’s megaprefences Sqlite Database

It seems like almost and age since I last published anything on here, I’d like to say its due to a global pandemic causing issues, but mostly its just due to amount of work and a lack of interesting problems to solve for a little while. Luckily this all changed yesterday……

Background

One of my colleagues approached me with a database that he believed included deleted fields. This related to the “Mega.nz” android application and many forensic investigators will be familiar with this as being an online storage application and I’ve had a number of Indecent Images of Children cases that revolve around this storage facility.

My colleague said they looked like there were base64 encoded, however when he decoded them he didn’t get any intelligible data. The database is named “megapreferences” and a copy was provided to myself.

Preliminary work

To start off with, I opened the database file with an SQLite viewer. I quite like SQLite spy but some of this work was done on a Linux machine and I used DB Browser for SQLite on there. It shouldn’t make any real difference to this which one you use

A quick look at this copy of the database showed that a number of tables were empty, but the “completedtransfers” table had some data and looked like this:

Figure 1. View of “completedtransfers” table

A quick look at the data suggests that it is Base64 encoded due to the prevalence of “=” characters at the end of the data. Although my colleague had already told me they had unsuccessfully tried to decode the data, I needed to verify this myself.

So I use CyberChef for my data decoding needs (https://gchq.github.io/cyberchef) and I took a sample string out of the database for testing purposes (“IdAI0AprbXDgp5RH4ByJz/hxDJjUGPi8fLOI2lVMJgs=”).

Figure 2 Decoding Base64 data using CyberChef

Running Base64 decoding over this produced data that appeared to have no meaning. So this clearly wasn’t the end of the process. Next step was to check it wasn’t something simple like it had been XOR’d using a single character. Luckily Cyberchef has an option for this under XOR Bruteforce

Figure 3 Brute force XOR on CyberChef

However reviewing these quickly showed no promising results and I wasn’t going to start running bigger key lengths due to the number of permutations involved.

There are so many possible encryption methods that could have been used, in varying complexities, that bruteforcing the data would not be feasible in any sensible time frame. Therefore I needed to figure out how the data had been encrypted, time for some reverse engineering!

Firsty I needed the APK of the application to start the reverse engineering process. I grabbed a copy of the APK for the Mega app from https://mega.en.uptodown.com/android/download.

Next I needed a decomplier, and preferably one that would work on Linux as I was still working off this machine.

The Jadx decompiler and more specifically the jadx-gui (github.com/skylot/jadx) proved an excellent choice. Imported the apk file into the jadx-gui and let it decode the sourcecode (takes a little while).

The Jadx software is a decompiler for Android APKs that reverts it to its original Java code, as opposed to those generate Byte code (such as Dalvik) which is very close to Assembly language.

Once the Jadx-Gui had been opened, it was a simple case of opening the APK file that had been previously downloaded. This started the decompilation process which takes a little time.

Once the source code has been loaded, we can run text searches over the data in order to find possible encryption routines. So started with “encrypt” and quickly noticed a lot of hits, so added a “(“ to make “encrypt(“. This gives us a number of results and we can see a number of SQL statements that appear to be encrypting the data as it is added to the database. This is consistent with what we have so far as the database itself is not encrypted but the individual fields are.

Figure 4 Search for Encrypt text

Further down the results, we can see a declaration for a function named “encrypt” which takes a string as an argument. This is likely to be the function that we are interested in

Figure 5 Search showing Encrypt Function

In Java, functions are either Public or Private, for reasons I won’t explore in this post, and can be defined as Static. The “String” text just before the function name tells us (and the compiler) what type of data the function will return. The text within the brackets are the arguments for the function including its type. This function has the text “String str” and this tells us that the function takes a single parameter that is a type String and is referred to as “str” within this function.

When we are in the searches view of Jadx-Gui, and indeed in many other section, double clicking on a result of function name (in the other parts of the program) will take us to the search hit (or function) in the source code. Double clicking on the hit static “Public Static String encrypt”

Figure 6 encrypt function

Running through this function shows us a number of things. The majority of the code is error checking. The first “if” statement is checking that the parameter “str” exists (i.e. == null) or if it is, to return a null value. This would stop the program crashing if a null string was passed.

The next section is a “try catch” section. This again is mostly error checking, the program will try and execute any code inbetween the “{” and “}” brackets after the “try” keyword. If an error occurs during this, the code within the “{” and “}” brackets after the “catch” keyword is executed. This prevents a program crashing on an error.

The only “useful” code within this function is the line starting “return”. The return keyword is used to pass a value back to the function that called it. So in this case, it will return a String (we know this from the function declaration) that contains the result of a function that Base64 encodes the result of the Util.aes_encrypt function.

The aes_encrypt function is passed three parameters:

  • The result of the “getAesKey” function
  • str.getBytes – this is a method of converting a String (text) into a binary object (Hex)
  • Literal value of 0

Now lets run another search for this and go to the function and we can see the aes_encrypt and aes_decrypt functions are next to each other.

Figure 7 aes_encrypt and aes_decrypt functions

So we can see both the aes_encrypt and aes_decrypt functions, but they don’t give too much further information on the key or other details. Earlier we saw that the result of the “getAesKey” function is passed to the aes_encrypt (and aes_decrypt) functions. Lets examine this function:

Figure 8 getAesKey Function

This gives a string of text and a function that appears to return the first 32 bytes of the string (for some reason as opposed to using whole string).

So now, we have a key, we can jump back into CyberChef and use the AES decrypt function

Unfortunately we also need an “IV” value and a mode to be able to decrypt it via this method and further examination of the source code didn’t seem to show that these values were passed to any encrypt function.

Figure 9 AES decryption in CyberChef

However, we have the encryption, key generation functions and the aes_decrypt, now lets find the calling decryption function. Given the naming convention, a keyword search for “decrypt(“ seems like a good starting point. Sure enough , this gives us the function we are looking for

Again it shows us that the string is decoded from Base64 and then decrypted to give a result. Still no further with these values, but we do have the functions required to decrypt the data. We can copy these functions out of the source code and create a small program to see if we can decode some text. Let create a small java program to test this:

As we can see, putting a string literal into this and decoding it using the functions provided (slightly modified to make it work) gives us a string of text that looks a lot like a file that would be downloaded using MEGA.

Details of a couple of the changes made:

  • The decrypt function was part of the class “Utils” in the original program, as this is now in our only class, it is called without a preceding Utils.
  • The base64 function comes from “android.util.Base64”. The java.util.Base64 version is slightly different and the code has been amended to reflect this.

Now we have working code, we need to try and decrypt all the values within the database. So the “simplest” way is to add a function into the Java program and then decrypt the values and output them in some way. However, Java is a pain of a language and if I could have ported this code to anything else (short of assembly) I would have.

My (slightly hacky) workaround was to make the program take a variable number of string arguments and decode them all and print them to the console, separated by “,” characters.

I then created a python script (Download Here ) that connected to the database, got the values and sent these to the java program (Download Here) and scooped up the results.In order to make the python script work, please put the java program (with .jar extension) in the same location as the script and note that a number of command window boxes will open during this process (one per record in table).

The usage of the script is megaDBDecrypt.py {path to database, including name}

So in order to make my life easier, I put the database file in the same location as the python script and the java file and ran “python megaDBDecrypt.py megapreferences”

Once the python script has finished, it creates a “completedtransferDec” table and then writes the decrypted results back into the database. This gives us the following results (some details hidden):

Et voila! We now have a decrypted table from the database. The Python script used the Java compiled code along with source code can be found in the downloads section or Here.

Note: If the database table is full of “null” values, please ensure your Java Runtime Environment is up to date https://java.com/en/download/manual.jsp

SQLite Databases at hex level

My recent post on “Timelining using SQLite Write Ahead Logs” highlighted how much background information is required to deal with SQLite databases. This post is going to give a more in-depth overview of the structure of the SQLite 3 file format. It should take you from knowing very little about SQLite databases to being able to manually locate records using a hex viewer.

I should start by saying that the documentation for the SQLite 3 file format is excellent and can be found here. The documentation is pretty much good enough that you can teach yourself it just using this information and a test database to check against. Where possible, I’ll reference the section of the documentation that the information has come from.

SQLite databases are now a massive part of the forensics workload. They are used by thousands of phone apps, on both Android and iOS, as well as being used heavily by the Windows/Linux/OS X operating systems for both system data (e.g. Windows Notification database) and user data (e.g. Chrome/Firefox history).

Prelims

Tools used:

X-Ways Forensics/Winhex – but any hex editor will do. HxD is also good if you need a free tool

X-tensions – If using X-Ways then you can use the SQLite Record Decoder (v0.3) to make your life easier – available here

Templates:

SQLite3 Page Header – Available here

Database – The database used throughout this post is called chinook.db and can be downloaded from here, though I’ve created a VHD with the file in which can be accessed here

Examining the database

Examining the database header

The first 100 bytes of the SQLite database is the database header and this stores key information about the database.

All values in SQLite are in Big Endian format and therefore you should ensure any hex editor you are using is displaying values in Big Endian or none of your values are going to make any sense at all!

The following structure can be seen on the SQLite file format website under section 1.3.

OffsetSizeDescription
016The header string: “SQLite format 3\000”
162The database page size in bytes. Must be a power of two between 512 and 32768 inclusive, or the value 1 representing a page size of 65536.
181File format write version. 1 for legacy; 2 for WAL.
191File format read version. 1 for legacy; 2 for WAL.
201Bytes of unused “reserved” space at the end of each page. Usually 0.
211Maximum embedded payload fraction. Must be 64.
221Minimum embedded payload fraction. Must be 32.
231Leaf payload fraction. Must be 32.
244File change counter.
284Size of the database file in pages. The “in-header database size”.
324Page number of the first freelist trunk page.
364Total number of freelist pages.
404The schema cookie.
444The schema format number. Supported schema formats are 1, 2, 3, and 4.
484Default page cache size.
524The page number of the largest root b-tree page when in auto-vacuum or incremental-vacuum modes, or zero otherwise.
564The database text encoding. A value of 1 means UTF-8. A value of 2 means UTF-16le. A value of 3 means UTF-16be.
604The “user version” as read and set by the user_version pragma.
644True (non-zero) for incremental-vacuum mode. False (zero) otherwise.
684The “Application ID” set by PRAGMA application_id.
7220Reserved for expansion. Must be zero.
924The version-valid-for number.
964SQLITE_VERSION_NUMBER
Taken from https://www.sqlite.org/fileformat.html

There are a lot of fields here and I’m going to concentrate on the most useful for us to interpret the rest of the database. Probably most important is the database page size in bytes as we will need this information in order to locate any other information within the database.

Pages are the building blocks of the SQLite database and the entire database is made up of pages. Even the SQLite database header takes space in the first page of the database. It is worth noting that SQLite pages start numbering from one and not zero and this becomes important when we are trying to locate the offsets for different pages.

In all modern versions of SQLite databases, the number of pages should be recorded in the header. If this information is blank, the number of pages can be assumed to be the size of the file divided by the page size.

Examining the root page

Immediately after the 100 byte SQLite database header, comes the root page. In all other pages in the database, the page header will be at offset 0 of the page.

The root page is used to store information about the tables contained within the SQLite database. This is in itself a table, albeit a hidden one, but can be seen quite easily. Using the Query “Select * from Sqlite_master” gives us the following information

Here we can see a number of records that are either “Table” or “Index” Records. There are other record types that are allowed, such as “Trigger” or “Views” but we have none of these in the database.

This table has a number of names, but the documentation refers to it as the “sqlite_schema” but also allows for different names:

  1. sqlite_master
  2. sqlite_temp_schema
  3. sqlite_temp_master

The SQLite schema table can be thought of as being created as such:

CREATE TABLE sqlite_schema(
  type text,
  name text,
  tbl_name text,
  rootpage integer,
  sql text
);

The rootpage field in this table is the page at which the root of the table can be located. The sqlite_schema table rootpage is effectively page 1 of the database, the page containing the SQLite database header. However we need to understand how the data is stored in the SQLite database

SQLite and b-tree

All data is stored within a b-tree in an SQLite database and this provides key/data storage. Two variants of b-trees are used by SQLite. “Table b-trees” use a 64-bit signed integer key and store all data in the leaves. “Index b-trees” use arbitrary keys and stores no data at all. We are going to concentrate on “Table b-trees”

A b-tree page is either an interior page or a leaf page. A leaf page contains keys and in the case of a table b-tree each key has associated data. An interior page contains K keys together with K+1 pointers to child b-tree pages. A “pointer” in an interior b-tree page is just the 32-bit unsigned integer page number of the child page.

Taken from Wikipedia

So in this table in SQLite, the page containing the keys 7 and 16 is the Interior Page and contains no data as this page has no pages above it, it is also referred to as a root page. The second row contains leaf pages, as they have no further pages below them. They also store all the data in the table.

All records that are less than or equal to 7 are stored in the left hand page of the tree. This is because the key stored in the interior page has a value of 7. The next key value is 16, so all records greater than 7 and less than or equal to 16 are stored in the central page. All records over 16 are stored in the right page. The key stored in the interior page will have a value and this is the page number of the corresponding leaf page.

This is a simplistic example and there may be many leaf pages as well as another layer of interior pages after the root page.

Decoding the sqlite_schema table

Using this information, we need to decode the sqlite_schema table, to find the root pages of the tables we want to find data in. In this example we are going to find the data for the table “albums”. So we start at offset 100 of the file, which is after the SQLite database header. The header format is well defined within the documentation.

OffsetSizeDescription
01The one-byte flag at offset 0 indicating the b-tree page type.
A value of 2 (0x02) means the page is an interior index b-tree page.
A value of 5 (0x05) means the page is an interior table b-tree page.
A value of 10 (0x0a) means the page is a leaf index b-tree page.
A value of 13 (0x0d) means the page is a leaf table b-tree page.
Any other value for the b-tree page type is an error.
12The two-byte integer at offset 1 gives the start of the first freeblock on the page, or is zero if there are no freeblocks.
32The two-byte integer at offset 3 gives the number of cells on the page.
52The two-byte integer at offset 5 designates the start of the cell content area. A zero value for this integer is interpreted as 65536.
71The one-byte integer at offset 7 gives the number of fragmented free bytes within the cell content area.
84The four-byte page number at offset 8 is the right-most pointer. This value appears in the header of interior b-tree pages only and is omitted from all other pages.

Using the X-ways Template, we get the following information:

Note that because this is an interior table b-tree page header (as denoted by value 0x05), the page header includes the right-most pointer that is omitted from non-interior pages.

We now have the offsets to all of the records within this page. In order to decode this information, we need to know what format the records stored in this page are in. Luckily the documentation spells this out under section 1.6:

DatatypeTable Leaf (0x0d)Table Interior (0x05)Index Leaf (0x0a)Index Interior (0x02)Description
4-byte integer  Page number of left child
varint Number of bytes of payload
varint  Rowid
byte array Payload
4-byte integer Page number of first overflow page
Taken from https://www.sqlite.org/fileformat.html – Slightly modified to fit

As we can see from our earlier template, this is a Table Interior page (page type is 0x05). So at each cell offset we are expecting a 4 byte big endian integer that is the page number of the left child and a single varint immediately after.

If we got to the offset for cell pointer 1 (1019) from the beginning of the page, we will find the data for that record. Note that even though the page header on this page starts at offset 100, due to the database header, the cell offsets start from offset 0 of the page which is the start of the database header. See below for the data in the 1st cell offset.

If we interpret this as a 4 page byte header followed by a varint, we get 0x00 0x00 0x00 0x06 as the 4 byte big endian integer (value 6) and the varint is a single byte varint with a value of 3. We can repeat this step for all offsets within the cell data. There is a section later on decoding varints, for now I will just provide the values.

Once we have done this, we can put the cell data into a table we get:

OffsetLeft PointerKey
101963
101474
100996
1004127
999149
9941912
9892115

Now we know which page relates to each set of key values. However if we look at our earlier query we do not see any key values. So where a primary key is not defined as part of a table, each record is given an autoincrementing number called rowid. We can see this by modifying our original query to view the sqlite_schema page to “select rowid,* from sqlite_master”. This gives us the following:

So now we can see that the record relating to the “albums” table is record 1. Looking back at our table, we can see that as the key value is less than or equal to 3, we need to go to page 6 of the database to find the data.

To navigate to page 6 of the database, we need to multiply the page size by the page number -1 (as Sqlite pages start numbering at 1). So in this case, page 6 is at offset 1024 X (6-1) = 5120. At the start of page 6, we can use the template header to decode it and get the following:

From this we can see a couple of things straight away. This is a page type 0x0D which is a table leaf page. If the table was bigger, we might have another interior page between the root and the leaf pages. In that case, we would decode the page as we did the root page.

Also we can see that this page has 3 records and this makes sense as the pointer from the root page was for values less than or equal to 3. If we jump to offset 506 from the start of the page, we will hit the first record.

Looking back at our table of record formats based on page type, we can see that for a 0x0D (Table Leaf) page the structure of each record is:

  • Varint: Number of bytes
  • Varint: RowID
  • Bytearray: Payload
  • 4-byte integer: Page for overflow page (not used).

In order to decode these records, we will need to understand varints

Varints

These are a variable length integer. They can be between one and 9 bytes in length, though commonly 1 – 3 bytes. It consists of zero or more bytes where the high bit is set and a single byte where the high bit is not set.

Bit87654321
Value1286432168421
NotesHigh bit       
List of values of bits in a byte

The high bit is the most significant, or highest value, bit in a byte. Where this is set, the next byte is part of the varint, if not it is the end of the varint

In hex, with only the high bit set, the hex value would be 0x80. Therefore any value >= 0x80 has the high bit set.

Consider the following data as a series of varint values:

0x45 0x81 0x23 0x83 0x01 0x82 0x81 0x34

If we start from the 1st value, 0x45, this value is less than 0x80 so the high bit is not set. This is a single byte varint.

We move to 0x81. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x23 and this is less than 0x80 therefore this is a 2 byte varint  (0x81 0x23)

We move to 0x83. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x01 and this is less than 0x80 therefore this is a 2 byte varint  (0x83 0x01)

We move to 0x82. This is greater than 0x80 and therefore the next byte is included. We move to 0x81. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x34 and this is less than 0x80 therefore this is a 3 byte varint  (0x82 0x81 0x34)

So at the end of this process we have:

  • Varint 1: 0x45
  • Varint 2: 0x81 0x23
  • Varint 3: 0x83 0x01
  • Varint 4: 0x82 0x81 0x34

Now we need to determine the actual values of these varints. To do this, we ignore the high bit and then calculate it as a normal integer. The easiest way to do this manually is the following (bytes are numbered right to left, so last byte of varint is byte 1):

Value = ((byte n – 0x80) * 0x80^n-1) + ……. ((byte 3 – 0x80) * 0x80 ^2) + ((byte 2 – 0x80) * 0x80 ^1) + ((byte 1 – 0x80) * 0x80 ^0)

Or in decimal and simplified for max 3 byte varint:

Value = ((byte 3 -128) * 128 ^2) + (byte 2 -128) * 128 + byte 1

So if we apply this to our varints above, we get the following values:

Varint 1: 0x45 / dec 69

Varint 2: (0x81 – 0x80) * 0x80 + 0x23 = 0x80 + 0x23 = 0xA3 / dec 163

Varint 3: (0x83 – 0x80) * 0x80 + 0x01 = 0x180 + 0x01 = 0x181 / dec 384

Varint 4: (0x82 – 0x80) * 0x80^2 + (0x81 – 0x80) * 0x80 + 0x34 =

0x02 * 4000 + 0x01 * 0x80 + 0x34 = 0x8000 + 0x80 + 0x34 = 0x80B4 / dec 32948

Decoding Table Leaf Records

Looking back at our data we can see the following as the start of the cell data

This image has an empty alt attribute; its file name is page6record1.png

So our data starts with 0x82 0x24 0x01 0x07 0x17 and the documentation tells us that we have 2 varints, the first for record size and the second for record ID.

So using our above formula the record size is (2 byte varint)

(0x82-0x80) * 0x80 + 0x24 = 0x124 or decimal 292

The second varint has a value of 0x01. This is our record ID and this matches the record ID relating to the “albums” record. If it didn’t, we could check the other cells to look for the record we are interested in.

Immediately after our 2 varints, we have the byte array which is the record. Section 2.1 of the SQLite documentation tells us that, in a leaf page, the data is always in record format which consists of two parts.

  • Header
  • Data

The header starts with a single varint that defines the size of the header, including the bytes defining the size. This is followed by a number of header fields.

Start data of record 1 on page 6

So immediately after our 0x01 second varint, we see a 0x07 byte. This is the length of the header, including that byte, which means our header data is:

0x07 0x17 0x19 0x19 0x01 0x84 0x23

The sqlite3 documentation provides a table detailing the different values:

Serial TypeContent SizeMeaning
00Value is a NULL.
11Value is an 8-bit twos-complement integer.
22Value is a big-endian 16-bit twos-complement integer.
33Value is a big-endian 24-bit twos-complement integer.
44Value is a big-endian 32-bit twos-complement integer.
56Value is a big-endian 48-bit twos-complement integer.
68Value is a big-endian 64-bit twos-complement integer.
78Value is a big-endian IEEE 754-2008 64-bit floating point number.
80Value is the integer 0. (Only available for schema format 4 and higher.)
90Value is the integer 1. (Only available for schema format 4 and higher.)
10,11variableReserved for internal use.
N≥12 and even(N-12)/2Value is a BLOB that is (N-12)/2 bytes in length.
N≥13 and odd(N-13)/2Value is a string in the text encoding and (N-13)/2 bytes in length. The null terminator is not stored.
Serial Type Codes Of The Record Format taken from SQLite documentation

So examining the fields in order we get (all values are varints in a header):

0x17 = 23. As this is greater than 13 and odd, it is a string (last row on table). To find the length we take 13 off the value, giving 10, and divide by 2. So our first field is a 5 character string.

The next 2 fields are very similar, both values 0x19. Using the same logic, we get 2 6 character strings (0x19 = 25, (25 -13) /2 = 6)

Our third value is a 0x01, which translates to a 1 byte integer.

Our last value is a 2 byte var int, compared to the single byte varints previously, so we first need to decode the varint value.

0x84 0x23 = (0x04 * 0x80) + 0x23 = 0x223 = 547 decimal

This is greater than 13 and odd so we know it’s a string. (547-13) / 2 = 267. So the last field is a 267 character string. This gives us our fields as :

  • 5 character string (0x17)
  • 6 character string (0x19)
  • 6 character string (0x19)
  • 1 byte integer
  • 267 character string.

Adding up the lengths of these fields gives us a total size of 5+6+6+1+267 = 285.

If we add the header length to this value (7) we get a combined length of 292 and that matches what we originally decoded from the 1st varint as our record size.

Immediately following that are the data values in order, each field has been highlighted in a different colour for ease of viewing

Here is where we can use the SQLite Record Decoder X-Tension to confirm our results. If you are using the VHD file in X-Ways, change the view to Disk or Partition view and select the first 2 or 3 bytes of the record. The run the X-Tension from the toolbar at the top of the screen. This will decode the data and highlight the whole record for you. The data is shown in the log window. Using the Sqlite record X-Tension we can confirm our results for this record (output of x-tension):

Offset 506 of page (file offset 5626).

[XT] Record ID:1                Record Length:292           Header Length:7

[XT] Record Data:”table”,”albums”,”albums”,”2″,”CREATE TABLE “albums”

(

    [AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,

    [Title] NVARCHAR(160)  NOT NULL,

    [ArtistId] INTEGER  NOT NULL,

    FOREIGN KEY ([ArtistId]) REFERENCES “artists” ([ArtistId])

                                ON DELETE NO ACTION ON UPDATE NO ACTION

)”,

Looking at the sqlite_master table that we saw previously, we can see how this matches the records (although you cannot see the data after the first newline)

The 1 byte integer field in our record is the rootpage for that table.  This rootpage is the start of the data for this table and can be interpreted using the same methods we have used to decode the data in the root page that was located at offset 100 of the database, immediately after database header.

In order to decode the data for the “albums” table, we would got to page 2 (offset 1024) and decode the page header there. If it was an interior page, we would have to map out the other pages associated with it, if it was a leaf page then we could extract the records from this page.

I’ll leave the post here and leave it as an exercise for you to decode the records for the album table. You can use and SQlite browser to check your answers. Hopefully this has given you a grasp of the fundamentals of how SQLite databases stored data.

In future posts, we will look at how we can use this information to recover deleted records.

Timelining using SQLite Write Ahead Logs

Todays question is: Can we tell when records have been deleted from an SQLite database?

TL;DR – We can provide some time and date information in very particular circumstances using the WAL log file. It can be very time consuming!

Quick warning – This article will assume some knowledge of SQLite databases. I will probably do some further posts on the fundamentals of SQLite databases as I think they are essential for any Digital Investigator. I will be doing posts on SQLite work in general and as I write them I will try and update this one with links to the other information.

Edit (20/11/2020) : I’ve now written the post on SQLite fundamentals – can be seen here

Background

This came up as part of an investigation where a number of text messages were deleted from a mobile phone device and it was necessary to attribute time and date information to the deletion. The records were some of the last ones on the device and it had been turned off/very little usage after the deletion.

The work I completed for this case obviously can’t be shared, and it was harder work to create test data for an IOS device for SMS/iMessage database, so throughout this I will be using test data from the Mozilla Firefox browser. The same principles apply to the original use case, though the technical details are slightly different.

Tools used throughout this post:

  • X-Ways Forensics – can use other forensic tools and work around, but template files require X-Ways.
  • SQLite Viewer – SQLiteSpy 1.9.14 (x64) – Download from here
  • XT_SQLiteRecordDecoder 0.3 – One of my creations, available here
  • Some X-Ways Templates – Available on downloads page here

Data used in examples:

  • Firefox WAL Data.vhd – Available here

Starting point

A quick examination of the SQLite database, that the records in question were located in, shows that the records are currently deleted in the database when the SQLite database and associated files (WAL and SHM files) were extracted from the exhibit. However, if just the SQLite database was extracted and viewed, the records were still showing as live in the database.

So why is this happening? Well the SQLite database system uses journaling to keep the integrity of a database if something unexpected happens (power outage to device etc.). There are two methods in use; Rollback Journal and Write Ahead Logs (WAL).

Rollback journals include information to restore a database back to its original state if a transaction fails, write ahead logs contain updates to the database that are then committed to the main database at a later time. In this case, the database that we were examining uses a Write Ahead Log, so Rollback journals will not be discussed further here.

The important information, that allows use to deduce dates and times for record deletions in this case, is that records in the SQLite databases have dates and times as part of the records. In the case of the SMS/iMessage database this would be the sent/received times and with the Firefox database we have last visited/visited times and dates. Without this information we would probably not be able to determine when the records were deleted.

Overview of WAL files

In order to understand how we are going to use these files to associate time and date information, we need to know some important information on how the WAL file works. The WAL file is always in the same folder as the database and has the same name but with “-wal” appended to it.

The SQLite database itself is made up of a series of pages that contain the records, and associated information, that are stored in the database. Notably pages in SQLite start from 1 rather than 0.

The WAL file consists of a header followed by a number of “frames”. Each frame contains a frame header and a page worth of data from the database with the updated contents. The same page can appear multiple times in the same WAL file; each time the page needs to be updated the latest version is found (either in WAL or database file) and then changes made and written to end of the WAL file.

The header of the WAL file has 2 Salt values as does each frame. For a frame to be considered valid, the 2 salts in the frame header must match the 2 in the WAL header.

Below is a very simple of sequence of update events with a diagram.

Basic WAL operations

The diagram above shows 3 update operations to the main SQLite database. The first is an update to page 1 and prior to this, the WAL log can be considered empty. As the page being updated does not already exist within the WAL file, a copy is made to the WAL file and the updates to the page are made in the copy of the page. The data within the SQLite database remains unchanged.

Next an update is made to page 4. Again this page does not exist already within the WAL file and so a copy of the page is made in the WAL file and updates made to the copy of the page.

Lastly, page 1 is updated again. As the page already exists within the WAL file, a copy is made of the earlier version of page 1 within the WAL file and this is moved to a new frame at the end of the WAL file. The new updates are then made to the later copy of the page 1 data.

At the end of this process, we have 3 versions of Page 1; One within the main database, one in the first frame of the WAL file and one within the 3rd frame of the WAL file. We also have 2 copies of page 4 in a similar manner. Each of the copies of the pages are different and the may be records that exist in the Page 1 in the first WAL frame that do not exist in the copy in the 3rd WAL frame.

There is a lot more to WAL files than described here, but this should be sufficient information for us to review the data.

Test data and setup

As I’ve created test data for this particular post, I’ll detail each of the actions taken, including the time, so that we can use it to compare against our results. All actions were taken on the 24/09/2020. Prior to the actions below, a new Windows 10 was set up and Firefox was installed. Once installed, all history was deleted from the database. Throughout the process, copies of the databases we created and these are labelled Point 1-4. When the copies were made is also listed below.

TimeAction
Initial copy of databases taken
0937Accessed BBC news website –
0938Accessed BBC news article
0938Moved to sports section of BBC News
Took a copy of Database and WAL files (Point 1)
0939Deleted access to BBC news article
Took a copy of Database and WAL files (Point 2)
0940Accessed URL of misspelt Dilbert website –
0940Visited Dilbert Website –
0940Accessed About section of Dilbert Website
Took a copy of Database and WAL files (Point 3)
Closed browser
Took a copy of Database (Point 4)
Table 1: Actions taken in Firefox browser/VM

As you can see, this is a very simple timeline of events; accessed a few webpages, deleted one and then accessed a couple more.

So in this instance, we know that the deletion was done at 0939. In this case we have multiple copies of the database as we go through the activity, in reality we would only have the one at most. This is mostly to illustrate how the logs are used but the technique is still possible with just the 1 set of files at Point 3.

Viewing the test data

The database files have been included in Virtual hard disk, so it can be mounted on any Windows Machine and set to read only. The Virtual hard disk can be attached using the Disk Management service, available from the Quick Link menu (Win key+X). It can also be accessed natively by a lot of forensic tools, including X-Ways Forensics.

On the disk, there are 4 folders, the initial SQLite database with no WAL file, Point 1-3 folders with a WAL/SHM file each and a Point 4 folder with no WAL file.

The templates that are shown at the top of this post can be used to quickly decode the hex data if using X-Ways/Winhex (with licence). Otherwise the data can be examined manually in other tools. All the offsets are shown throughout this post.

Before we start examining the files, all of the structures used in SQLite are Big-Endian, make sure that whatever tool you are using is set up to interpret Big-Endian data. I will be using X-Ways Forensics throughout and other tools may display slightly differently.

WAL Structure

So firstly we need to look at the structure of the WAL file (tables taken from SQLite 3 website):

WAL Header Format

OffsetSizeDescription
04Magic number. 0x377f0682 or 0x377f0683
44File format version. Currently 3007000.
84Database page size. Example: 1024
124Checkpoint sequence number
164Salt-1: random integer incremented with each checkpoint
204Salt-2: a different random number for each checkpoint
244Checksum-1: First part of a checksum on the first 24 bytes of header
284Checksum-2: Second part of the checksum on the first 24 bytes of header

Important information that we require from here:

  • Database page size – Then we know how much data is stored in frame
  • Salt 1 & 2 – We can use these to check for valid frames.

To be thorough, we should also use the Checksums to check the integrity of the frames, but as that requires computation, rather than simple matching, we will just use the salts.

Immediately after the WAL header (offset 32/0x20) we have the first frame which starts with a frame header:

WAL Frame Header Format

OffsetSizeDescription
04Page number
44For commit records, the size of the database file in pages after the commit. For all other records, zero.
84Salt-1 copied from the WAL header
124Salt-2 copied from the WAL header
164Checksum-1: Cumulative checksum up through and including this page
204Checksum-2: Second half of the cumulative checksum.

So we know that the size of the frame header to 24 bytes and we have the page size from the WAL header format, adding these together gives us the size of each Frame.

If we apply the template to the first 24 bytes of any of the WAL files (they should all be the same) we get:

This provides us with some useful information, including the SALT 1 & 2 values that will be required to check each frame is valid. It has also provided us with that page size, which is required to figure out the correct frame size, as 32,768 bytes.

Firefox History

With this information in hand, we need to apply it to our Firefox SQLite database and associated files. If you are not familiar with how Firefox stores history in the places.sqlite database, there are many good articles out there that cover it in far more detail than I will. However there are 2 tables in the database that we are going to focus on; moz_places and moz_historyvisits.

The moz_places table contains the URL and other information relating to a specific webpage. The moz_historyvisits contains information relating to when an entry in the moz_places table has been visited, including time/date and how the site was accessed. When the last reference in the moz_historyvisits is deleted for a particular entry in the moz_places table, the corresponding entry in moz_places should also be deleted. As there is only on reference to the deleted record, so we can either examine for changes to the moz_places table or the moz_historyvisits table

Examination of the WAL file

If we start on the first frame of the “Point 1 “WAL file (offset 32/0x20) and apply the template we get the following information:

This tells us that this frame relates to page 2 of the database. This doesn’t immediately help us, as we don’t know if that has any relation to the two tables we are interested in!. The SALT 1 and 2 values match those shown in the header, so we are happy that the frame itself is valid.

So how do we check for pages relating to a particular table? Well for this we need to view the hidden table in the places.sqlite database, the SQLITE_MASTER table. This table exists in all SQLite databases and can be accessed using the standard select query. So running the command “SELECT * FROM SQLITE_MASTER” on the database gives us the following results:

So we can see 1 entry for moz_places and one for moz_historyvisits. The most useful information is the ‘rootpage’ field (4th column) and this gives us the starting page for this database. In this case the root page for moz_places is 4 and moz_historyvisits is 12.

If the tables contained more records, then the data would be split into multiple pages and the data in the root page would show us where the other data was stored. In this case, with a relatively small amount of data, the entire data for that table is stored in the root page. If this were not the case, we would have to identify all the pages associated with the table and examine for all of them.

So one way to find the data for each page would be to look at the header of the first frame header (offset 32/0x20) check if it’s for a page we are interested in. If so review the data, if not jump to the next frame header by jumping (Frame Header Size + Page Size) bytes. This is quite time consuming and there can be a large number of pages per WAL, even in our small example.

The solution is to take advantage of some of the nice keyword search functionality in X-Ways Forensics. It has the option to run searches at set offsets rather than every byte using the “Cond: Offset mod”. So if we search for the hex “00 00 00 04”, Big Endian value 4, at Offset 32 every 32792 (header size + page size) bytes, we will find any frame header that relates to page 4. See below for the options as seen in X-Ways. We know the first header is at 32 bytes due to the WAL header taking up those first 32 bytes. Then we want to jump a full frame and check offset 32 (next frame header) for the value and so on and so forth. Please note that this setting will stay on next time you run a search and I accept no blame for you not finding search hits in your next job because you left this conditional offset on!

If we run this on the “Point 1” database, we would see the following offsets for hits:

If we run this on the “Point 2” database, we would see the following offsets for hits:

As we can see, there is an additional entry at 1082168 that exists in the “Point 2” database but not the “Point 1” database. Also notice that there are references to this page after this offset; will look at these later.

If we go to file offset 1016584 and look at the page data there, using the X-Ways template, we see:

So we have 12 live records in this page. Now if we do the same for the page data stored in the frame starting at 1082168 we get:

So here we can see that the cell pointer at page offset 31,052 has been deleted and the pointer to 30,631 has been moved to the slot that contained the offset. Notice the first deleted cell pointer is also 30,631 as the address is not cleared when it moves.

So going to back to the frame starting at file offset 1016584, if we jump 31,052 bytes from the start of the page header, we get the following hex data:

I’ve highlighted the 1st byte of the data. From here we could decode the SQLite record and view the data. I’m not going to cover how to do this manually in this blog post, it’s almost a post in its own right, but I am going to introduce an X-Tension that I have created to make this process easier.

The XT_SQLiteRecordDecoder X-Tension (link at top of this post) requires you to highlight the start of the SQLite record and it will then decode the rest and highlight the entire record. In needs to be used from the disk/partition view in X-Ways rather than the file view. So if you place your cursor on the first byte of the record and press the partition button on the top left of the hex window you should be taken to partition offset 34,2087,875. Highlight the “82 18” hex values by dragging across the data window. Should look like this:

Hex data with first 2 bytes of record highlighted

Now press the “Run X-Tensions” button on the menu at the top of the X-Ways interface:

This will give you the menu to run the X-Tension:

On your first time running the X-Tension, you will have to provide X-Ways with the location and this can be done by pressing the “+” button on the bottom middle of the dialog box. Once the X-Tension is highlighted, press OK and the following should be displayed in the output window:

[XT] Record ID:12	Record Length:280	Header Length:17
[XT] Record Data:"12","https://www.bbc.co.uk/news/av/business-54268400","Brexit: Why is it so hard to reach a deal? - BBC News","ku.oc.cbb.www.",1,0,0,"100","1600936685597000","KLmlJJvjjSPQ",0,"47356514151165","Time is running out to reach a Brexit deal - what are the obstacles?","https://ichef.bbci.co.uk/images/ic/400xn/p08sbh6q.jpg","5"

Given that we know the structure of the database, we can match each of these values back to a field. The whole record should also be highlighted in the hex window:

Hex data with entire SQLite record highlighted

The value “1600936685597000” is the last visit time and we use this as the time the record was entered into the database. This value is a Firefox timestamp, which is a Unix timestamp in milliseconds rather than seconds. Dividing it by 1,000,000 gives us a Unix timestamp and this one translates to 24/09/2020 at 08:38:05 UTC.

So we now have our time for the record being created in the database.

In order to locate our next timestamp that we can use, we need to look at the point 3 database. If you run the same search for WAL headers relating to Page 4, you should get the following list:

So the frame at file offset 1082168 was the last frame we examined in the Point 2 WAL file, so the next change to the page would be located at file offset 1574048.  You may have noticed that some of the offsets after this offset are different to the previous databases. This is due to previous WAL records, which have been committed to the database, being overwritten with new WAL records. I’ll cover this a little more at the end of this post.

Now if we examine the frame at offset 1574048 we can see that the number of cells is back up to 12 and that the last offset is different. The new data has taken some of the space previously occupied by our deleted record.

If we go to page offset 31,251 we can see the record for the (misspelt) www.dilbrt.com

Decoding this record using the XT_SQLiteRecordDecoder X-Tension gives us the following information:

[XT] Record ID:14	Record Length:82	Header Length:16
[XT] Record Data:"14","http://www.dilbrt.com/","<NULL>","moc.trblid.www.",1,1,1,"2000","1600936808068000","8dLkpvxIP9tC",0,"125510922083568","<NULL>","<NULL>","6",

1600936808068000 again is a Firefox timestamp that translates to 24/09/2020 at 08:40:08.

So now we know when the record was added to the database and when the next record was added. So logically we can deduce that the deletion has to have taken place between 08:38:05 and 08:40:08 on 24/09/2020 (UTC).

We could have proved this in a similar manner using the moz_historyvisits table and matching the visits record to the moz_places url. Potentially we could have narrowed this down further using the record relating to the BBC sports section, which is recorded as being last accessed at 08:30:30 UTC on the same day, using the exact same techniques.

I hope this has been helpful in showing a practical example of how time lining SQLite records using the WAL file is possible (if quite time consuming to do manually!).

P.s: Hits from later in the WAL file

So earlier we discussed that there were hits for the page number after the latest record that we were interested in. So to explain that, we shall have a look at the WAL frame header located at offset 1934760 on the “Point 2” WAL database and compare it the one from offset 1016584 of the same database.

See how the SALT 1 and 2 values don’t match, this is because the WAL frames are from before the last checkpoint operation on the database, where the WAL data is integrated back into the database. The SQLite3 File format specifies that the WAL salt-1 value is incremented by 1 and the salt-2 value is randomized, which is why the salt-1 value in the header at offset 1934760 is 1 larger that value in the header at offset 1016584.

When VM’s go wrong

Welcome to the first (hopefully of many) posts to the new blog. This post is based around an issue that was raised to me that they couldn’t virtualise an exhibit for a case they were working on.

I will mention a number of tools in this post and I want to add I don’t have any vested interest in any of them! These tools just so happen to be the ones available to me and/or my personal preferences.

There are a number of reasons that you might want to create a VM of a device, from wanting to check the settings of a particular program to extracting passwords from browsers such as Google Chrome. Although many of the things that you might want to display from a VM can be done by interpreting a settings file or sifting through registry keys, sometimes it is just so much nicer to have a visual capture of the settings as the user would have seen them.

There have been many posts on how to create a VM from a forensic image file and some very good software has been created to help in this task. This post, however, is dedicated to creating a VM of an exhibit that is NOT in a bootable state at the time of seizure.

So, the investigator has tried creating a VM from scratch using Virtualbox and also tried using Virtual Forensic Computing (VFC), which is usually pretty good at making VM’s when they don’t usually work. The machine would start up but no operating system would boot and there was a Windows 7 (x64) system installed. Time for a little bit of work!

The starting place for our investigation was to check the Master Boot Record. At a very quick glance, it appears to be intact, the boot code looks normal and there are partition entries. However, a closer inspection reveals that the partition is set to inactive.

MBR for disk with start of first partition entry highlighted

Normally, we would expect to see a value of 0x80 for the byte that is set to 0x00. It appears the rest of the entry is intact as well and if you use the X-Ways Template manager to view the data, it matches the location of the start of the NTFS partition.

This behavior is especially odd as the device was showing as being used very close to the time the device was seized, definitely within 12 hours of the last activity on the device. So our first job is to fix the MBR for the VM.

To do this, I use Arsenal Image Mounter from Arsenal Recon. I prefer this to FTK imager as it provides a true mounted disk (similar to Encase PDE but without having to dirty my system with Encase) and that gives some additional capabilities. But when mounting with this tool, we set it to keep a cache of changes made to the disk. Also note what physical drive the image file is mounted as, we will need that later.

Interestingly, as soon as the disk is mounted, it is showing as an uninitialized disk within the Disk Management window. So now we need to change the data on the disk using a hex editor. I’m a big X-Ways Forensics fan and, although X-Ways won’t allow you to change data, Winhex that ships with it allows you to easily change the values.

Run Winhex in administrator mode and open the physical disk. Once this is done, simply navigate to the first byte of the record (offset 446) and type 0x80 into the byte. It should show up as red and then press Ctrl+S to save the change. We are not actually altering evidence here, due to the write cache on the image file, so don’t worry.

Now we are going to try and make a VM out of this newly changed disk. However, given the errors that you can easily get with creating VM’s off virtual disks and using different software, we are going to create a VMDK of the disk using Virtualbox. For those who don’t know about frequent VM errors, the Physicaldrive can be shown as being used and therefore stops the virtualisation software using it. This usually requires a restart of the machine, which is pain as you usually always have something processing.

So to create the VMDK file from the mounted image file, we can use the following command:

cd C:\Program Files\OracleVirtualBox

C:Program FilesOracleVirtualBox>vboxmanage convertfromraw \\.PhysicalDriveX "C:\outputpath\outputfilename.vmdk" --format VMDK

This will usually take between half an hour and a couple of hours depending on your machine and the size of the disk. But once this has been done, we have a VMDK that no longer requires the disk to be mounted and has the primary entry in the MBR marked as active. You can also create a copy of the VMDK file in case we break the original!

Now to create a new VM to try and boot this system. I was using Virtualbox, but VMware player should work equally well, so the screenshots will be for this software. These should provide a rough guide for creating a VM in virtualbox and remeber to select the VMDK we created earlier for the hard drive.

Now, simply starting the VM still gives us an error to say “Operating system cannot be found”. This is clearly a larger issue than the MBR entry being broken. So, in order to try and fix this, we can attach a Windows Installation ISO to the VM. This will give us some repair functionality to try and get this system working. I used a Windows 8.1 ISO for this machine, however any version equal to or newer than the version you are trying to fix should be fine.

Once the VM started it gave us the option to boot from the installation disk by pressing any key. Once the ISO booted, we selected “Repair your computer > Troubleshoot > Advanced Options > Command Prompt”.  This opened the command prompt within the repair tools. 

Now there are several things that can cause the system not to boot correctly. The first is damaged MBR, but we had already fixed that, the next items are all specific to the Windows OS. We have a Boot Configuration Data (BCD) that contains information required to boot the system. This is used in Windows Vista and newer and replaces the boot.ini that was used by previous version. Now we are in a command prompt, we can try and fix this

X:\Sources>bootrec /fixmbr 
The operation completed successfully
X:\Sources>bootrec /fixboot 
The operation completed successfully
X:\Sources>bootrec /rebuildbcd 
Successfully scanned Windows installations.
Total identified Windows installations: 1
[1] C:\Windows
Add installation to boot list? Yes (Y)/No (N)/All(A): Y (select ‘Yes’ by responding with ‘Y’)
The operation completed successfully

The first command fixes the MBR, just in case there was an error we couldn’t see. That might include corrupt assembly code that isn’t easily visible. The second and third commands relate to the BCD and, as we can see from the output, it identified the Windows OS on the disk. After this, we restarted the virtual machine and came up with the same OS no found error.

So restart the machine and back into the Windows Installation disk. This time we navigate through Repair your computer > Troubleshoot > Advanced Options > Startup Repair. It displays the detected operating system(s) that can be repaired.

This then shows us the OS’s that can be seen and we select our Windows 7 operating system as the one that needs repairing. It then runs through its process and restarts the VM. This time, the VM gives us the option to start Windows normally or to start in safe mode. Selecting starting windows normally gives a blue screen.

Still this is a further step forward, and those who deal with VM’s regularly know there are a variety of methods for fixing this kind of error. However by this point I just wanted the thing to boot, so we went back to VFC. By mounting the VMDK that was attached to our Virtualbox machine, we could point VFC at the (now fixed) disk and generate the Virtual Machine. This then booted us straight through to the Windows Login screen. Hooray!

If this hadn’t worked, there were still a couple of options left for us, the most notable was using the Windows boot disc to restore the operating system to an earlier point in time. This would hopefully give us a point at which the system was still in working order.

As you can see, this is a very long winded way of creating a VM, but it does allow us to virtualise a machine that appears. So thanks for reading my first post and hopefully it has been useful, or at the very least, entertaining!

New Blog!

Hi and welcome to the first post of my new blog! I’m hoping this will be the first of many posts, although this will be the only one where nothing forensic/investigative will be posted.

I’ve worked in Digital Forensics/Investigations for a number of years and over this time I’ve created new tools and encountered some weird and unusual problems. With the tools, I’ve mostly distributed them to others that I know within the field, but feel like I should make a larger number of them available to the community as a whole.

This blog came around as in my office its become quite common to “ask clees” when there’s an issue that nobody else seems to have any idea of how to fix. Usually when this happens, I get the person involved to do a little write up of how we fixed it and send it round the office so that everyone can deal with the issue if they should get it (or a similar one).

After solving one problem, a colleague suggested that I should create a blog and post write-ups to forensic problems and that it would allow me to be even easier to point people to any previous work that had been done by pointing to this blog!

So hopefully it won’t be long before the first actual forensic post is up and running, until then enjoy the rest of the site!