SQLite Databases at hex level

My recent post on “Timelining using SQLite Write Ahead Logs” highlighted how much background information is required to deal with SQLite databases. This post is going to give a more in-depth overview of the structure of the SQLite 3 file format. It should take you from knowing very little about SQLite databases to being able to manually locate records using a hex viewer.

I should start by saying that the documentation for the SQLite 3 file format is excellent and can be found here. The documentation is pretty much good enough that you can teach yourself it just using this information and a test database to check against. Where possible, I’ll reference the section of the documentation that the information has come from.

SQLite databases are now a massive part of the forensics workload. They are used by thousands of phone apps, on both Android and iOS, as well as being used heavily by the Windows/Linux/OS X operating systems for both system data (e.g. Windows Notification database) and user data (e.g. Chrome/Firefox history).

Prelims

Tools used:

X-Ways Forensics/Winhex – but any hex editor will do. HxD is also good if you need a free tool

X-tensions – If using X-Ways then you can use the SQLite Record Decoder (v0.3) to make your life easier – available here

Templates:

SQLite3 Page Header – Available here

Database – The database used throughout this post is called chinook.db and can be downloaded from here, though I’ve created a VHD with the file in which can be accessed here

Examining the database

Examining the database header

The first 100 bytes of the SQLite database is the database header and this stores key information about the database.

All values in SQLite are in Big Endian format and therefore you should ensure any hex editor you are using is displaying values in Big Endian or none of your values are going to make any sense at all!

The following structure can be seen on the SQLite file format website under section 1.3.

OffsetSizeDescription
016The header string: “SQLite format 3\000”
162The database page size in bytes. Must be a power of two between 512 and 32768 inclusive, or the value 1 representing a page size of 65536.
181File format write version. 1 for legacy; 2 for WAL.
191File format read version. 1 for legacy; 2 for WAL.
201Bytes of unused “reserved” space at the end of each page. Usually 0.
211Maximum embedded payload fraction. Must be 64.
221Minimum embedded payload fraction. Must be 32.
231Leaf payload fraction. Must be 32.
244File change counter.
284Size of the database file in pages. The “in-header database size”.
324Page number of the first freelist trunk page.
364Total number of freelist pages.
404The schema cookie.
444The schema format number. Supported schema formats are 1, 2, 3, and 4.
484Default page cache size.
524The page number of the largest root b-tree page when in auto-vacuum or incremental-vacuum modes, or zero otherwise.
564The database text encoding. A value of 1 means UTF-8. A value of 2 means UTF-16le. A value of 3 means UTF-16be.
604The “user version” as read and set by the user_version pragma.
644True (non-zero) for incremental-vacuum mode. False (zero) otherwise.
684The “Application ID” set by PRAGMA application_id.
7220Reserved for expansion. Must be zero.
924The version-valid-for number.
964SQLITE_VERSION_NUMBER
Taken from https://www.sqlite.org/fileformat.html

There are a lot of fields here and I’m going to concentrate on the most useful for us to interpret the rest of the database. Probably most important is the database page size in bytes as we will need this information in order to locate any other information within the database.

Pages are the building blocks of the SQLite database and the entire database is made up of pages. Even the SQLite database header takes space in the first page of the database. It is worth noting that SQLite pages start numbering from one and not zero and this becomes important when we are trying to locate the offsets for different pages.

In all modern versions of SQLite databases, the number of pages should be recorded in the header. If this information is blank, the number of pages can be assumed to be the size of the file divided by the page size.

Examining the root page

Immediately after the 100 byte SQLite database header, comes the root page. In all other pages in the database, the page header will be at offset 0 of the page.

The root page is used to store information about the tables contained within the SQLite database. This is in itself a table, albeit a hidden one, but can be seen quite easily. Using the Query “Select * from Sqlite_master” gives us the following information

Here we can see a number of records that are either “Table” or “Index” Records. There are other record types that are allowed, such as “Trigger” or “Views” but we have none of these in the database.

This table has a number of names, but the documentation refers to it as the “sqlite_schema” but also allows for different names:

  1. sqlite_master
  2. sqlite_temp_schema
  3. sqlite_temp_master

The SQLite schema table can be thought of as being created as such:

CREATE TABLE sqlite_schema(
  type text,
  name text,
  tbl_name text,
  rootpage integer,
  sql text
);

The rootpage field in this table is the page at which the root of the table can be located. The sqlite_schema table rootpage is effectively page 1 of the database, the page containing the SQLite database header. However we need to understand how the data is stored in the SQLite database

SQLite and b-tree

All data is stored within a b-tree in an SQLite database and this provides key/data storage. Two variants of b-trees are used by SQLite. “Table b-trees” use a 64-bit signed integer key and store all data in the leaves. “Index b-trees” use arbitrary keys and stores no data at all. We are going to concentrate on “Table b-trees”

A b-tree page is either an interior page or a leaf page. A leaf page contains keys and in the case of a table b-tree each key has associated data. An interior page contains K keys together with K+1 pointers to child b-tree pages. A “pointer” in an interior b-tree page is just the 32-bit unsigned integer page number of the child page.

Taken from Wikipedia

So in this table in SQLite, the page containing the keys 7 and 16 is the Interior Page and contains no data as this page has no pages above it, it is also referred to as a root page. The second row contains leaf pages, as they have no further pages below them. They also store all the data in the table.

All records that are less than or equal to 7 are stored in the left hand page of the tree. This is because the key stored in the interior page has a value of 7. The next key value is 16, so all records greater than 7 and less than or equal to 16 are stored in the central page. All records over 16 are stored in the right page. The key stored in the interior page will have a value and this is the page number of the corresponding leaf page.

This is a simplistic example and there may be many leaf pages as well as another layer of interior pages after the root page.

Decoding the sqlite_schema table

Using this information, we need to decode the sqlite_schema table, to find the root pages of the tables we want to find data in. In this example we are going to find the data for the table “albums”. So we start at offset 100 of the file, which is after the SQLite database header. The header format is well defined within the documentation.

OffsetSizeDescription
01The one-byte flag at offset 0 indicating the b-tree page type.
A value of 2 (0x02) means the page is an interior index b-tree page.
A value of 5 (0x05) means the page is an interior table b-tree page.
A value of 10 (0x0a) means the page is a leaf index b-tree page.
A value of 13 (0x0d) means the page is a leaf table b-tree page.
Any other value for the b-tree page type is an error.
12The two-byte integer at offset 1 gives the start of the first freeblock on the page, or is zero if there are no freeblocks.
32The two-byte integer at offset 3 gives the number of cells on the page.
52The two-byte integer at offset 5 designates the start of the cell content area. A zero value for this integer is interpreted as 65536.
71The one-byte integer at offset 7 gives the number of fragmented free bytes within the cell content area.
84The four-byte page number at offset 8 is the right-most pointer. This value appears in the header of interior b-tree pages only and is omitted from all other pages.

Using the X-ways Template, we get the following information:

Note that because this is an interior table b-tree page header (as denoted by value 0x05), the page header includes the right-most pointer that is omitted from non-interior pages.

We now have the offsets to all of the records within this page. In order to decode this information, we need to know what format the records stored in this page are in. Luckily the documentation spells this out under section 1.6:

DatatypeTable Leaf (0x0d)Table Interior (0x05)Index Leaf (0x0a)Index Interior (0x02)Description
4-byte integer  Page number of left child
varint Number of bytes of payload
varint  Rowid
byte array Payload
4-byte integer Page number of first overflow page
Taken from https://www.sqlite.org/fileformat.html – Slightly modified to fit

As we can see from our earlier template, this is a Table Interior page (page type is 0x05). So at each cell offset we are expecting a 4 byte big endian integer that is the page number of the left child and a single varint immediately after.

If we got to the offset for cell pointer 1 (1019) from the beginning of the page, we will find the data for that record. Note that even though the page header on this page starts at offset 100, due to the database header, the cell offsets start from offset 0 of the page which is the start of the database header. See below for the data in the 1st cell offset.

If we interpret this as a 4 page byte header followed by a varint, we get 0x00 0x00 0x00 0x06 as the 4 byte big endian integer (value 6) and the varint is a single byte varint with a value of 3. We can repeat this step for all offsets within the cell data. There is a section later on decoding varints, for now I will just provide the values.

Once we have done this, we can put the cell data into a table we get:

OffsetLeft PointerKey
101963
101474
100996
1004127
999149
9941912
9892115

Now we know which page relates to each set of key values. However if we look at our earlier query we do not see any key values. So where a primary key is not defined as part of a table, each record is given an autoincrementing number called rowid. We can see this by modifying our original query to view the sqlite_schema page to “select rowid,* from sqlite_master”. This gives us the following:

So now we can see that the record relating to the “albums” table is record 1. Looking back at our table, we can see that as the key value is less than or equal to 3, we need to go to page 6 of the database to find the data.

To navigate to page 6 of the database, we need to multiply the page size by the page number -1 (as Sqlite pages start numbering at 1). So in this case, page 6 is at offset 1024 X (6-1) = 5120. At the start of page 6, we can use the template header to decode it and get the following:

From this we can see a couple of things straight away. This is a page type 0x0D which is a table leaf page. If the table was bigger, we might have another interior page between the root and the leaf pages. In that case, we would decode the page as we did the root page.

Also we can see that this page has 3 records and this makes sense as the pointer from the root page was for values less than or equal to 3. If we jump to offset 506 from the start of the page, we will hit the first record.

Looking back at our table of record formats based on page type, we can see that for a 0x0D (Table Leaf) page the structure of each record is:

  • Varint: Number of bytes
  • Varint: RowID
  • Bytearray: Payload
  • 4-byte integer: Page for overflow page (not used).

In order to decode these records, we will need to understand varints

Varints

These are a variable length integer. They can be between one and 9 bytes in length, though commonly 1 – 3 bytes. It consists of zero or more bytes where the high bit is set and a single byte where the high bit is not set.

Bit87654321
Value1286432168421
NotesHigh bit       
List of values of bits in a byte

The high bit is the most significant, or highest value, bit in a byte. Where this is set, the next byte is part of the varint, if not it is the end of the varint

In hex, with only the high bit set, the hex value would be 0x80. Therefore any value >= 0x80 has the high bit set.

Consider the following data as a series of varint values:

0x45 0x81 0x23 0x83 0x01 0x82 0x81 0x34

If we start from the 1st value, 0x45, this value is less than 0x80 so the high bit is not set. This is a single byte varint.

We move to 0x81. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x23 and this is less than 0x80 therefore this is a 2 byte varint  (0x81 0x23)

We move to 0x83. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x01 and this is less than 0x80 therefore this is a 2 byte varint  (0x83 0x01)

We move to 0x82. This is greater than 0x80 and therefore the next byte is included. We move to 0x81. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x34 and this is less than 0x80 therefore this is a 3 byte varint  (0x82 0x81 0x34)

So at the end of this process we have:

  • Varint 1: 0x45
  • Varint 2: 0x81 0x23
  • Varint 3: 0x83 0x01
  • Varint 4: 0x82 0x81 0x34

Now we need to determine the actual values of these varints. To do this, we ignore the high bit and then calculate it as a normal integer. The easiest way to do this manually is the following (bytes are numbered right to left, so last byte of varint is byte 1):

Value = ((byte n – 0x80) * 0x80^n-1) + ……. ((byte 3 – 0x80) * 0x80 ^2) + ((byte 2 – 0x80) * 0x80 ^1) + ((byte 1 – 0x80) * 0x80 ^0)

Or in decimal and simplified for max 3 byte varint:

Value = ((byte 3 -128) * 128 ^2) + (byte 2 -128) * 128 + byte 1

So if we apply this to our varints above, we get the following values:

Varint 1: 0x45 / dec 69

Varint 2: (0x81 – 0x80) * 0x80 + 0x23 = 0x80 + 0x23 = 0xA3 / dec 163

Varint 3: (0x83 – 0x80) * 0x80 + 0x01 = 0x180 + 0x01 = 0x181 / dec 384

Varint 4: (0x82 – 0x80) * 0x80^2 + (0x81 – 0x80) * 0x80 + 0x34 =

0x02 * 4000 + 0x01 * 0x80 + 0x34 = 0x8000 + 0x80 + 0x34 = 0x80B4 / dec 32948

Decoding Table Leaf Records

Looking back at our data we can see the following as the start of the cell data

This image has an empty alt attribute; its file name is page6record1.png

So our data starts with 0x82 0x24 0x01 0x07 0x17 and the documentation tells us that we have 2 varints, the first for record size and the second for record ID.

So using our above formula the record size is (2 byte varint)

(0x82-0x80) * 0x80 + 0x24 = 0x124 or decimal 292

The second varint has a value of 0x01. This is our record ID and this matches the record ID relating to the “albums” record. If it didn’t, we could check the other cells to look for the record we are interested in.

Immediately after our 2 varints, we have the byte array which is the record. Section 2.1 of the SQLite documentation tells us that, in a leaf page, the data is always in record format which consists of two parts.

  • Header
  • Data

The header starts with a single varint that defines the size of the header, including the bytes defining the size. This is followed by a number of header fields.

Start data of record 1 on page 6

So immediately after our 0x01 second varint, we see a 0x07 byte. This is the length of the header, including that byte, which means our header data is:

0x07 0x17 0x19 0x19 0x01 0x84 0x23

The sqlite3 documentation provides a table detailing the different values:

Serial TypeContent SizeMeaning
00Value is a NULL.
11Value is an 8-bit twos-complement integer.
22Value is a big-endian 16-bit twos-complement integer.
33Value is a big-endian 24-bit twos-complement integer.
44Value is a big-endian 32-bit twos-complement integer.
56Value is a big-endian 48-bit twos-complement integer.
68Value is a big-endian 64-bit twos-complement integer.
78Value is a big-endian IEEE 754-2008 64-bit floating point number.
80Value is the integer 0. (Only available for schema format 4 and higher.)
90Value is the integer 1. (Only available for schema format 4 and higher.)
10,11variableReserved for internal use.
N≥12 and even(N-12)/2Value is a BLOB that is (N-12)/2 bytes in length.
N≥13 and odd(N-13)/2Value is a string in the text encoding and (N-13)/2 bytes in length. The null terminator is not stored.
Serial Type Codes Of The Record Format taken from SQLite documentation

So examining the fields in order we get (all values are varints in a header):

0x17 = 23. As this is greater than 13 and odd, it is a string (last row on table). To find the length we take 13 off the value, giving 10, and divide by 2. So our first field is a 5 character string.

The next 2 fields are very similar, both values 0x19. Using the same logic, we get 2 6 character strings (0x19 = 25, (25 -13) /2 = 6)

Our third value is a 0x01, which translates to a 1 byte integer.

Our last value is a 2 byte var int, compared to the single byte varints previously, so we first need to decode the varint value.

0x84 0x23 = (0x04 * 0x80) + 0x23 = 0x223 = 547 decimal

This is greater than 13 and odd so we know it’s a string. (547-13) / 2 = 267. So the last field is a 267 character string. This gives us our fields as :

  • 5 character string (0x17)
  • 6 character string (0x19)
  • 6 character string (0x19)
  • 1 byte integer
  • 267 character string.

Adding up the lengths of these fields gives us a total size of 5+6+6+1+267 = 285.

If we add the header length to this value (7) we get a combined length of 292 and that matches what we originally decoded from the 1st varint as our record size.

Immediately following that are the data values in order, each field has been highlighted in a different colour for ease of viewing

Here is where we can use the SQLite Record Decoder X-Tension to confirm our results. If you are using the VHD file in X-Ways, change the view to Disk or Partition view and select the first 2 or 3 bytes of the record. The run the X-Tension from the toolbar at the top of the screen. This will decode the data and highlight the whole record for you. The data is shown in the log window. Using the Sqlite record X-Tension we can confirm our results for this record (output of x-tension):

Offset 506 of page (file offset 5626).

[XT] Record ID:1                Record Length:292           Header Length:7

[XT] Record Data:”table”,”albums”,”albums”,”2″,”CREATE TABLE “albums”

(

    [AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,

    [Title] NVARCHAR(160)  NOT NULL,

    [ArtistId] INTEGER  NOT NULL,

    FOREIGN KEY ([ArtistId]) REFERENCES “artists” ([ArtistId])

                                ON DELETE NO ACTION ON UPDATE NO ACTION

)”,

Looking at the sqlite_master table that we saw previously, we can see how this matches the records (although you cannot see the data after the first newline)

The 1 byte integer field in our record is the rootpage for that table.  This rootpage is the start of the data for this table and can be interpreted using the same methods we have used to decode the data in the root page that was located at offset 100 of the database, immediately after database header.

In order to decode the data for the “albums” table, we would got to page 2 (offset 1024) and decode the page header there. If it was an interior page, we would have to map out the other pages associated with it, if it was a leaf page then we could extract the records from this page.

I’ll leave the post here and leave it as an exercise for you to decode the records for the album table. You can use and SQlite browser to check your answers. Hopefully this has given you a grasp of the fundamentals of how SQLite databases stored data.

In future posts, we will look at how we can use this information to recover deleted records.

Timelining using SQLite Write Ahead Logs

Todays question is: Can we tell when records have been deleted from an SQLite database?

TL;DR – We can provide some time and date information in very particular circumstances using the WAL log file. It can be very time consuming!

Quick warning – This article will assume some knowledge of SQLite databases. I will probably do some further posts on the fundamentals of SQLite databases as I think they are essential for any Digital Investigator. I will be doing posts on SQLite work in general and as I write them I will try and update this one with links to the other information.

Edit (20/11/2020) : I’ve now written the post on SQLite fundamentals – can be seen here

Background

This came up as part of an investigation where a number of text messages were deleted from a mobile phone device and it was necessary to attribute time and date information to the deletion. The records were some of the last ones on the device and it had been turned off/very little usage after the deletion.

The work I completed for this case obviously can’t be shared, and it was harder work to create test data for an IOS device for SMS/iMessage database, so throughout this I will be using test data from the Mozilla Firefox browser. The same principles apply to the original use case, though the technical details are slightly different.

Tools used throughout this post:

  • X-Ways Forensics – can use other forensic tools and work around, but template files require X-Ways.
  • SQLite Viewer – SQLiteSpy 1.9.14 (x64) – Download from here
  • XT_SQLiteRecordDecoder 0.3 – One of my creations, available here
  • Some X-Ways Templates – Available on downloads page here

Data used in examples:

  • Firefox WAL Data.vhd – Available here

Starting point

A quick examination of the SQLite database, that the records in question were located in, shows that the records are currently deleted in the database when the SQLite database and associated files (WAL and SHM files) were extracted from the exhibit. However, if just the SQLite database was extracted and viewed, the records were still showing as live in the database.

So why is this happening? Well the SQLite database system uses journaling to keep the integrity of a database if something unexpected happens (power outage to device etc.). There are two methods in use; Rollback Journal and Write Ahead Logs (WAL).

Rollback journals include information to restore a database back to its original state if a transaction fails, write ahead logs contain updates to the database that are then committed to the main database at a later time. In this case, the database that we were examining uses a Write Ahead Log, so Rollback journals will not be discussed further here.

The important information, that allows use to deduce dates and times for record deletions in this case, is that records in the SQLite databases have dates and times as part of the records. In the case of the SMS/iMessage database this would be the sent/received times and with the Firefox database we have last visited/visited times and dates. Without this information we would probably not be able to determine when the records were deleted.

Overview of WAL files

In order to understand how we are going to use these files to associate time and date information, we need to know some important information on how the WAL file works. The WAL file is always in the same folder as the database and has the same name but with “-wal” appended to it.

The SQLite database itself is made up of a series of pages that contain the records, and associated information, that are stored in the database. Notably pages in SQLite start from 1 rather than 0.

The WAL file consists of a header followed by a number of “frames”. Each frame contains a frame header and a page worth of data from the database with the updated contents. The same page can appear multiple times in the same WAL file; each time the page needs to be updated the latest version is found (either in WAL or database file) and then changes made and written to end of the WAL file.

The header of the WAL file has 2 Salt values as does each frame. For a frame to be considered valid, the 2 salts in the frame header must match the 2 in the WAL header.

Below is a very simple of sequence of update events with a diagram.

Basic WAL operations

The diagram above shows 3 update operations to the main SQLite database. The first is an update to page 1 and prior to this, the WAL log can be considered empty. As the page being updated does not already exist within the WAL file, a copy is made to the WAL file and the updates to the page are made in the copy of the page. The data within the SQLite database remains unchanged.

Next an update is made to page 4. Again this page does not exist already within the WAL file and so a copy of the page is made in the WAL file and updates made to the copy of the page.

Lastly, page 1 is updated again. As the page already exists within the WAL file, a copy is made of the earlier version of page 1 within the WAL file and this is moved to a new frame at the end of the WAL file. The new updates are then made to the later copy of the page 1 data.

At the end of this process, we have 3 versions of Page 1; One within the main database, one in the first frame of the WAL file and one within the 3rd frame of the WAL file. We also have 2 copies of page 4 in a similar manner. Each of the copies of the pages are different and the may be records that exist in the Page 1 in the first WAL frame that do not exist in the copy in the 3rd WAL frame.

There is a lot more to WAL files than described here, but this should be sufficient information for us to review the data.

Test data and setup

As I’ve created test data for this particular post, I’ll detail each of the actions taken, including the time, so that we can use it to compare against our results. All actions were taken on the 24/09/2020. Prior to the actions below, a new Windows 10 was set up and Firefox was installed. Once installed, all history was deleted from the database. Throughout the process, copies of the databases we created and these are labelled Point 1-4. When the copies were made is also listed below.

TimeAction
Initial copy of databases taken
0937Accessed BBC news website –
0938Accessed BBC news article
0938Moved to sports section of BBC News
Took a copy of Database and WAL files (Point 1)
0939Deleted access to BBC news article
Took a copy of Database and WAL files (Point 2)
0940Accessed URL of misspelt Dilbert website –
0940Visited Dilbert Website –
0940Accessed About section of Dilbert Website
Took a copy of Database and WAL files (Point 3)
Closed browser
Took a copy of Database (Point 4)
Table 1: Actions taken in Firefox browser/VM

As you can see, this is a very simple timeline of events; accessed a few webpages, deleted one and then accessed a couple more.

So in this instance, we know that the deletion was done at 0939. In this case we have multiple copies of the database as we go through the activity, in reality we would only have the one at most. This is mostly to illustrate how the logs are used but the technique is still possible with just the 1 set of files at Point 3.

Viewing the test data

The database files have been included in Virtual hard disk, so it can be mounted on any Windows Machine and set to read only. The Virtual hard disk can be attached using the Disk Management service, available from the Quick Link menu (Win key+X). It can also be accessed natively by a lot of forensic tools, including X-Ways Forensics.

On the disk, there are 4 folders, the initial SQLite database with no WAL file, Point 1-3 folders with a WAL/SHM file each and a Point 4 folder with no WAL file.

The templates that are shown at the top of this post can be used to quickly decode the hex data if using X-Ways/Winhex (with licence). Otherwise the data can be examined manually in other tools. All the offsets are shown throughout this post.

Before we start examining the files, all of the structures used in SQLite are Big-Endian, make sure that whatever tool you are using is set up to interpret Big-Endian data. I will be using X-Ways Forensics throughout and other tools may display slightly differently.

WAL Structure

So firstly we need to look at the structure of the WAL file (tables taken from SQLite 3 website):

WAL Header Format

OffsetSizeDescription
04Magic number. 0x377f0682 or 0x377f0683
44File format version. Currently 3007000.
84Database page size. Example: 1024
124Checkpoint sequence number
164Salt-1: random integer incremented with each checkpoint
204Salt-2: a different random number for each checkpoint
244Checksum-1: First part of a checksum on the first 24 bytes of header
284Checksum-2: Second part of the checksum on the first 24 bytes of header

Important information that we require from here:

  • Database page size – Then we know how much data is stored in frame
  • Salt 1 & 2 – We can use these to check for valid frames.

To be thorough, we should also use the Checksums to check the integrity of the frames, but as that requires computation, rather than simple matching, we will just use the salts.

Immediately after the WAL header (offset 32/0x20) we have the first frame which starts with a frame header:

WAL Frame Header Format

OffsetSizeDescription
04Page number
44For commit records, the size of the database file in pages after the commit. For all other records, zero.
84Salt-1 copied from the WAL header
124Salt-2 copied from the WAL header
164Checksum-1: Cumulative checksum up through and including this page
204Checksum-2: Second half of the cumulative checksum.

So we know that the size of the frame header to 24 bytes and we have the page size from the WAL header format, adding these together gives us the size of each Frame.

If we apply the template to the first 24 bytes of any of the WAL files (they should all be the same) we get:

This provides us with some useful information, including the SALT 1 & 2 values that will be required to check each frame is valid. It has also provided us with that page size, which is required to figure out the correct frame size, as 32,768 bytes.

Firefox History

With this information in hand, we need to apply it to our Firefox SQLite database and associated files. If you are not familiar with how Firefox stores history in the places.sqlite database, there are many good articles out there that cover it in far more detail than I will. However there are 2 tables in the database that we are going to focus on; moz_places and moz_historyvisits.

The moz_places table contains the URL and other information relating to a specific webpage. The moz_historyvisits contains information relating to when an entry in the moz_places table has been visited, including time/date and how the site was accessed. When the last reference in the moz_historyvisits is deleted for a particular entry in the moz_places table, the corresponding entry in moz_places should also be deleted. As there is only on reference to the deleted record, so we can either examine for changes to the moz_places table or the moz_historyvisits table

Examination of the WAL file

If we start on the first frame of the “Point 1 “WAL file (offset 32/0x20) and apply the template we get the following information:

This tells us that this frame relates to page 2 of the database. This doesn’t immediately help us, as we don’t know if that has any relation to the two tables we are interested in!. The SALT 1 and 2 values match those shown in the header, so we are happy that the frame itself is valid.

So how do we check for pages relating to a particular table? Well for this we need to view the hidden table in the places.sqlite database, the SQLITE_MASTER table. This table exists in all SQLite databases and can be accessed using the standard select query. So running the command “SELECT * FROM SQLITE_MASTER” on the database gives us the following results:

So we can see 1 entry for moz_places and one for moz_historyvisits. The most useful information is the ‘rootpage’ field (4th column) and this gives us the starting page for this database. In this case the root page for moz_places is 4 and moz_historyvisits is 12.

If the tables contained more records, then the data would be split into multiple pages and the data in the root page would show us where the other data was stored. In this case, with a relatively small amount of data, the entire data for that table is stored in the root page. If this were not the case, we would have to identify all the pages associated with the table and examine for all of them.

So one way to find the data for each page would be to look at the header of the first frame header (offset 32/0x20) check if it’s for a page we are interested in. If so review the data, if not jump to the next frame header by jumping (Frame Header Size + Page Size) bytes. This is quite time consuming and there can be a large number of pages per WAL, even in our small example.

The solution is to take advantage of some of the nice keyword search functionality in X-Ways Forensics. It has the option to run searches at set offsets rather than every byte using the “Cond: Offset mod”. So if we search for the hex “00 00 00 04”, Big Endian value 4, at Offset 32 every 32792 (header size + page size) bytes, we will find any frame header that relates to page 4. See below for the options as seen in X-Ways. We know the first header is at 32 bytes due to the WAL header taking up those first 32 bytes. Then we want to jump a full frame and check offset 32 (next frame header) for the value and so on and so forth. Please note that this setting will stay on next time you run a search and I accept no blame for you not finding search hits in your next job because you left this conditional offset on!

If we run this on the “Point 1” database, we would see the following offsets for hits:

If we run this on the “Point 2” database, we would see the following offsets for hits:

As we can see, there is an additional entry at 1082168 that exists in the “Point 2” database but not the “Point 1” database. Also notice that there are references to this page after this offset; will look at these later.

If we go to file offset 1016584 and look at the page data there, using the X-Ways template, we see:

So we have 12 live records in this page. Now if we do the same for the page data stored in the frame starting at 1082168 we get:

So here we can see that the cell pointer at page offset 31,052 has been deleted and the pointer to 30,631 has been moved to the slot that contained the offset. Notice the first deleted cell pointer is also 30,631 as the address is not cleared when it moves.

So going to back to the frame starting at file offset 1016584, if we jump 31,052 bytes from the start of the page header, we get the following hex data:

I’ve highlighted the 1st byte of the data. From here we could decode the SQLite record and view the data. I’m not going to cover how to do this manually in this blog post, it’s almost a post in its own right, but I am going to introduce an X-Tension that I have created to make this process easier.

The XT_SQLiteRecordDecoder X-Tension (link at top of this post) requires you to highlight the start of the SQLite record and it will then decode the rest and highlight the entire record. In needs to be used from the disk/partition view in X-Ways rather than the file view. So if you place your cursor on the first byte of the record and press the partition button on the top left of the hex window you should be taken to partition offset 34,2087,875. Highlight the “82 18” hex values by dragging across the data window. Should look like this:

Hex data with first 2 bytes of record highlighted

Now press the “Run X-Tensions” button on the menu at the top of the X-Ways interface:

This will give you the menu to run the X-Tension:

On your first time running the X-Tension, you will have to provide X-Ways with the location and this can be done by pressing the “+” button on the bottom middle of the dialog box. Once the X-Tension is highlighted, press OK and the following should be displayed in the output window:

[XT] Record ID:12	Record Length:280	Header Length:17
[XT] Record Data:"12","https://www.bbc.co.uk/news/av/business-54268400","Brexit: Why is it so hard to reach a deal? - BBC News","ku.oc.cbb.www.",1,0,0,"100","1600936685597000","KLmlJJvjjSPQ",0,"47356514151165","Time is running out to reach a Brexit deal - what are the obstacles?","https://ichef.bbci.co.uk/images/ic/400xn/p08sbh6q.jpg","5"

Given that we know the structure of the database, we can match each of these values back to a field. The whole record should also be highlighted in the hex window:

Hex data with entire SQLite record highlighted

The value “1600936685597000” is the last visit time and we use this as the time the record was entered into the database. This value is a Firefox timestamp, which is a Unix timestamp in milliseconds rather than seconds. Dividing it by 1,000,000 gives us a Unix timestamp and this one translates to 24/09/2020 at 08:38:05 UTC.

So we now have our time for the record being created in the database.

In order to locate our next timestamp that we can use, we need to look at the point 3 database. If you run the same search for WAL headers relating to Page 4, you should get the following list:

So the frame at file offset 1082168 was the last frame we examined in the Point 2 WAL file, so the next change to the page would be located at file offset 1574048.  You may have noticed that some of the offsets after this offset are different to the previous databases. This is due to previous WAL records, which have been committed to the database, being overwritten with new WAL records. I’ll cover this a little more at the end of this post.

Now if we examine the frame at offset 1574048 we can see that the number of cells is back up to 12 and that the last offset is different. The new data has taken some of the space previously occupied by our deleted record.

If we go to page offset 31,251 we can see the record for the (misspelt) www.dilbrt.com

Decoding this record using the XT_SQLiteRecordDecoder X-Tension gives us the following information:

[XT] Record ID:14	Record Length:82	Header Length:16
[XT] Record Data:"14","http://www.dilbrt.com/","<NULL>","moc.trblid.www.",1,1,1,"2000","1600936808068000","8dLkpvxIP9tC",0,"125510922083568","<NULL>","<NULL>","6",

1600936808068000 again is a Firefox timestamp that translates to 24/09/2020 at 08:40:08.

So now we know when the record was added to the database and when the next record was added. So logically we can deduce that the deletion has to have taken place between 08:38:05 and 08:40:08 on 24/09/2020 (UTC).

We could have proved this in a similar manner using the moz_historyvisits table and matching the visits record to the moz_places url. Potentially we could have narrowed this down further using the record relating to the BBC sports section, which is recorded as being last accessed at 08:30:30 UTC on the same day, using the exact same techniques.

I hope this has been helpful in showing a practical example of how time lining SQLite records using the WAL file is possible (if quite time consuming to do manually!).

P.s: Hits from later in the WAL file

So earlier we discussed that there were hits for the page number after the latest record that we were interested in. So to explain that, we shall have a look at the WAL frame header located at offset 1934760 on the “Point 2” WAL database and compare it the one from offset 1016584 of the same database.

See how the SALT 1 and 2 values don’t match, this is because the WAL frames are from before the last checkpoint operation on the database, where the WAL data is integrated back into the database. The SQLite3 File format specifies that the WAL salt-1 value is incremented by 1 and the salt-2 value is randomized, which is why the salt-1 value in the header at offset 1934760 is 1 larger that value in the header at offset 1016584.

When VM’s go wrong

Welcome to the first (hopefully of many) posts to the new blog. This post is based around an issue that was raised to me that they couldn’t virtualise an exhibit for a case they were working on.

I will mention a number of tools in this post and I want to add I don’t have any vested interest in any of them! These tools just so happen to be the ones available to me and/or my personal preferences.

There are a number of reasons that you might want to create a VM of a device, from wanting to check the settings of a particular program to extracting passwords from browsers such as Google Chrome. Although many of the things that you might want to display from a VM can be done by interpreting a settings file or sifting through registry keys, sometimes it is just so much nicer to have a visual capture of the settings as the user would have seen them.

There have been many posts on how to create a VM from a forensic image file and some very good software has been created to help in this task. This post, however, is dedicated to creating a VM of an exhibit that is NOT in a bootable state at the time of seizure.

So, the investigator has tried creating a VM from scratch using Virtualbox and also tried using Virtual Forensic Computing (VFC), which is usually pretty good at making VM’s when they don’t usually work. The machine would start up but no operating system would boot and there was a Windows 7 (x64) system installed. Time for a little bit of work!

The starting place for our investigation was to check the Master Boot Record. At a very quick glance, it appears to be intact, the boot code looks normal and there are partition entries. However, a closer inspection reveals that the partition is set to inactive.

MBR for disk with start of first partition entry highlighted

Normally, we would expect to see a value of 0x80 for the byte that is set to 0x00. It appears the rest of the entry is intact as well and if you use the X-Ways Template manager to view the data, it matches the location of the start of the NTFS partition.

This behavior is especially odd as the device was showing as being used very close to the time the device was seized, definitely within 12 hours of the last activity on the device. So our first job is to fix the MBR for the VM.

To do this, I use Arsenal Image Mounter from Arsenal Recon. I prefer this to FTK imager as it provides a true mounted disk (similar to Encase PDE but without having to dirty my system with Encase) and that gives some additional capabilities. But when mounting with this tool, we set it to keep a cache of changes made to the disk. Also note what physical drive the image file is mounted as, we will need that later.

Interestingly, as soon as the disk is mounted, it is showing as an uninitialized disk within the Disk Management window. So now we need to change the data on the disk using a hex editor. I’m a big X-Ways Forensics fan and, although X-Ways won’t allow you to change data, Winhex that ships with it allows you to easily change the values.

Run Winhex in administrator mode and open the physical disk. Once this is done, simply navigate to the first byte of the record (offset 446) and type 0x80 into the byte. It should show up as red and then press Ctrl+S to save the change. We are not actually altering evidence here, due to the write cache on the image file, so don’t worry.

Now we are going to try and make a VM out of this newly changed disk. However, given the errors that you can easily get with creating VM’s off virtual disks and using different software, we are going to create a VMDK of the disk using Virtualbox. For those who don’t know about frequent VM errors, the Physicaldrive can be shown as being used and therefore stops the virtualisation software using it. This usually requires a restart of the machine, which is pain as you usually always have something processing.

So to create the VMDK file from the mounted image file, we can use the following command:

cd C:\Program Files\OracleVirtualBox

C:Program FilesOracleVirtualBox>vboxmanage convertfromraw \\.PhysicalDriveX "C:\outputpath\outputfilename.vmdk" --format VMDK

This will usually take between half an hour and a couple of hours depending on your machine and the size of the disk. But once this has been done, we have a VMDK that no longer requires the disk to be mounted and has the primary entry in the MBR marked as active. You can also create a copy of the VMDK file in case we break the original!

Now to create a new VM to try and boot this system. I was using Virtualbox, but VMware player should work equally well, so the screenshots will be for this software. These should provide a rough guide for creating a VM in virtualbox and remeber to select the VMDK we created earlier for the hard drive.

Now, simply starting the VM still gives us an error to say “Operating system cannot be found”. This is clearly a larger issue than the MBR entry being broken. So, in order to try and fix this, we can attach a Windows Installation ISO to the VM. This will give us some repair functionality to try and get this system working. I used a Windows 8.1 ISO for this machine, however any version equal to or newer than the version you are trying to fix should be fine.

Once the VM started it gave us the option to boot from the installation disk by pressing any key. Once the ISO booted, we selected “Repair your computer > Troubleshoot > Advanced Options > Command Prompt”.  This opened the command prompt within the repair tools. 

Now there are several things that can cause the system not to boot correctly. The first is damaged MBR, but we had already fixed that, the next items are all specific to the Windows OS. We have a Boot Configuration Data (BCD) that contains information required to boot the system. This is used in Windows Vista and newer and replaces the boot.ini that was used by previous version. Now we are in a command prompt, we can try and fix this

X:\Sources>bootrec /fixmbr 
The operation completed successfully
X:\Sources>bootrec /fixboot 
The operation completed successfully
X:\Sources>bootrec /rebuildbcd 
Successfully scanned Windows installations.
Total identified Windows installations: 1
[1] C:\Windows
Add installation to boot list? Yes (Y)/No (N)/All(A): Y (select ‘Yes’ by responding with ‘Y’)
The operation completed successfully

The first command fixes the MBR, just in case there was an error we couldn’t see. That might include corrupt assembly code that isn’t easily visible. The second and third commands relate to the BCD and, as we can see from the output, it identified the Windows OS on the disk. After this, we restarted the virtual machine and came up with the same OS no found error.

So restart the machine and back into the Windows Installation disk. This time we navigate through Repair your computer > Troubleshoot > Advanced Options > Startup Repair. It displays the detected operating system(s) that can be repaired.

This then shows us the OS’s that can be seen and we select our Windows 7 operating system as the one that needs repairing. It then runs through its process and restarts the VM. This time, the VM gives us the option to start Windows normally or to start in safe mode. Selecting starting windows normally gives a blue screen.

Still this is a further step forward, and those who deal with VM’s regularly know there are a variety of methods for fixing this kind of error. However by this point I just wanted the thing to boot, so we went back to VFC. By mounting the VMDK that was attached to our Virtualbox machine, we could point VFC at the (now fixed) disk and generate the Virtual Machine. This then booted us straight through to the Windows Login screen. Hooray!

If this hadn’t worked, there were still a couple of options left for us, the most notable was using the Windows boot disc to restore the operating system to an earlier point in time. This would hopefully give us a point at which the system was still in working order.

As you can see, this is a very long winded way of creating a VM, but it does allow us to virtualise a machine that appears. So thanks for reading my first post and hopefully it has been useful, or at the very least, entertaining!

New Blog!

Hi and welcome to the first post of my new blog! I’m hoping this will be the first of many posts, although this will be the only one where nothing forensic/investigative will be posted.

I’ve worked in Digital Forensics/Investigations for a number of years and over this time I’ve created new tools and encountered some weird and unusual problems. With the tools, I’ve mostly distributed them to others that I know within the field, but feel like I should make a larger number of them available to the community as a whole.

This blog came around as in my office its become quite common to “ask clees” when there’s an issue that nobody else seems to have any idea of how to fix. Usually when this happens, I get the person involved to do a little write up of how we fixed it and send it round the office so that everyone can deal with the issue if they should get it (or a similar one).

After solving one problem, a colleague suggested that I should create a blog and post write-ups to forensic problems and that it would allow me to be even easier to point people to any previous work that had been done by pointing to this blog!

So hopefully it won’t be long before the first actual forensic post is up and running, until then enjoy the rest of the site!