My recent post on “Timelining using SQLite Write Ahead Logs” highlighted how much background information is required to deal with SQLite databases. This post is going to give a more in-depth overview of the structure of the SQLite 3 file format. It should take you from knowing very little about SQLite databases to being able to manually locate records using a hex viewer.
I should start by saying that the documentation for the SQLite 3 file format is excellent and can be found here. The documentation is pretty much good enough that you can teach yourself it just using this information and a test database to check against. Where possible, I’ll reference the section of the documentation that the information has come from.
SQLite databases are now a massive part of the forensics workload. They are used by thousands of phone apps, on both Android and iOS, as well as being used heavily by the Windows/Linux/OS X operating systems for both system data (e.g. Windows Notification database) and user data (e.g. Chrome/Firefox history).
X-Ways Forensics/Winhex – but any hex editor will do. HxD is also good if you need a free tool
X-tensions – If using X-Ways then you can use the SQLite Record Decoder (v0.3) to make your life easier – available here
SQLite3 Page Header – Available here
Examining the database
Examining the database header
The first 100 bytes of the SQLite database is the database header and this stores key information about the database.
All values in SQLite are in Big Endian format and therefore you should ensure any hex editor you are using is displaying values in Big Endian or none of your values are going to make any sense at all!
The following structure can be seen on the SQLite file format website under section 1.3.
|0||16||The header string: “SQLite format 3\000”|
|16||2||The database page size in bytes. Must be a power of two between 512 and 32768 inclusive, or the value 1 representing a page size of 65536.|
|18||1||File format write version. 1 for legacy; 2 for WAL.|
|19||1||File format read version. 1 for legacy; 2 for WAL.|
|20||1||Bytes of unused “reserved” space at the end of each page. Usually 0.|
|21||1||Maximum embedded payload fraction. Must be 64.|
|22||1||Minimum embedded payload fraction. Must be 32.|
|23||1||Leaf payload fraction. Must be 32.|
|24||4||File change counter.|
|28||4||Size of the database file in pages. The “in-header database size”.|
|32||4||Page number of the first freelist trunk page.|
|36||4||Total number of freelist pages.|
|40||4||The schema cookie.|
|44||4||The schema format number. Supported schema formats are 1, 2, 3, and 4.|
|48||4||Default page cache size.|
|52||4||The page number of the largest root b-tree page when in auto-vacuum or incremental-vacuum modes, or zero otherwise.|
|56||4||The database text encoding. A value of 1 means UTF-8. A value of 2 means UTF-16le. A value of 3 means UTF-16be.|
|60||4||The “user version” as read and set by the user_version pragma.|
|64||4||True (non-zero) for incremental-vacuum mode. False (zero) otherwise.|
|68||4||The “Application ID” set by PRAGMA application_id.|
|72||20||Reserved for expansion. Must be zero.|
|92||4||The version-valid-for number.|
There are a lot of fields here and I’m going to concentrate on the most useful for us to interpret the rest of the database. Probably most important is the database page size in bytes as we will need this information in order to locate any other information within the database.
Pages are the building blocks of the SQLite database and the entire database is made up of pages. Even the SQLite database header takes space in the first page of the database. It is worth noting that SQLite pages start numbering from one and not zero and this becomes important when we are trying to locate the offsets for different pages.
In all modern versions of SQLite databases, the number of pages should be recorded in the header. If this information is blank, the number of pages can be assumed to be the size of the file divided by the page size.
Examining the root page
Immediately after the 100 byte SQLite database header, comes the root page. In all other pages in the database, the page header will be at offset 0 of the page.
The root page is used to store information about the tables contained within the SQLite database. This is in itself a table, albeit a hidden one, but can be seen quite easily. Using the Query “Select * from Sqlite_master” gives us the following information
Here we can see a number of records that are either “Table” or “Index” Records. There are other record types that are allowed, such as “Trigger” or “Views” but we have none of these in the database.
This table has a number of names, but the documentation refers to it as the “sqlite_schema” but also allows for different names:
The SQLite schema table can be thought of as being created as such:
CREATE TABLE sqlite_schema( type text, name text, tbl_name text, rootpage integer, sql text );
The rootpage field in this table is the page at which the root of the table can be located. The sqlite_schema table rootpage is effectively page 1 of the database, the page containing the SQLite database header. However we need to understand how the data is stored in the SQLite database
SQLite and b-tree
All data is stored within a b-tree in an SQLite database and this provides key/data storage. Two variants of b-trees are used by SQLite. “Table b-trees” use a 64-bit signed integer key and store all data in the leaves. “Index b-trees” use arbitrary keys and stores no data at all. We are going to concentrate on “Table b-trees”
A b-tree page is either an interior page or a leaf page. A leaf page contains keys and in the case of a table b-tree each key has associated data. An interior page contains K keys together with K+1 pointers to child b-tree pages. A “pointer” in an interior b-tree page is just the 32-bit unsigned integer page number of the child page.
So in this table in SQLite, the page containing the keys 7 and 16 is the Interior Page and contains no data as this page has no pages above it, it is also referred to as a root page. The second row contains leaf pages, as they have no further pages below them. They also store all the data in the table.
All records that are less than or equal to 7 are stored in the left hand page of the tree. This is because the key stored in the interior page has a value of 7. The next key value is 16, so all records greater than 7 and less than or equal to 16 are stored in the central page. All records over 16 are stored in the right page. The key stored in the interior page will have a value and this is the page number of the corresponding leaf page.
This is a simplistic example and there may be many leaf pages as well as another layer of interior pages after the root page.
Decoding the sqlite_schema table
Using this information, we need to decode the sqlite_schema table, to find the root pages of the tables we want to find data in. In this example we are going to find the data for the table “albums”. So we start at offset 100 of the file, which is after the SQLite database header. The header format is well defined within the documentation.
|0||1||The one-byte flag at offset 0 indicating the b-tree page type.|
A value of 2 (0x02) means the page is an interior index b-tree page.
A value of 5 (0x05) means the page is an interior table b-tree page.
A value of 10 (0x0a) means the page is a leaf index b-tree page.
A value of 13 (0x0d) means the page is a leaf table b-tree page.
Any other value for the b-tree page type is an error.
|1||2||The two-byte integer at offset 1 gives the start of the first freeblock on the page, or is zero if there are no freeblocks.|
|3||2||The two-byte integer at offset 3 gives the number of cells on the page.|
|5||2||The two-byte integer at offset 5 designates the start of the cell content area. A zero value for this integer is interpreted as 65536.|
|7||1||The one-byte integer at offset 7 gives the number of fragmented free bytes within the cell content area.|
|8||4||The four-byte page number at offset 8 is the right-most pointer. This value appears in the header of interior b-tree pages only and is omitted from all other pages.|
Using the X-ways Template, we get the following information:
Note that because this is an interior table b-tree page header (as denoted by value 0x05), the page header includes the right-most pointer that is omitted from non-interior pages.
We now have the offsets to all of the records within this page. In order to decode this information, we need to know what format the records stored in this page are in. Luckily the documentation spells this out under section 1.6:
|Datatype||Table Leaf (0x0d)||Table Interior (0x05)||Index Leaf (0x0a)||Index Interior (0x02)||Description|
|4-byte integer||✔||✔||Page number of left child|
|varint||✔||✔||✔||Number of bytes of payload|
|4-byte integer||✔||✔||✔||Page number of first overflow page|
As we can see from our earlier template, this is a Table Interior page (page type is 0x05). So at each cell offset we are expecting a 4 byte big endian integer that is the page number of the left child and a single varint immediately after.
If we got to the offset for cell pointer 1 (1019) from the beginning of the page, we will find the data for that record. Note that even though the page header on this page starts at offset 100, due to the database header, the cell offsets start from offset 0 of the page which is the start of the database header. See below for the data in the 1st cell offset.
If we interpret this as a 4 page byte header followed by a varint, we get 0x00 0x00 0x00 0x06 as the 4 byte big endian integer (value 6) and the varint is a single byte varint with a value of 3. We can repeat this step for all offsets within the cell data. There is a section later on decoding varints, for now I will just provide the values.
Once we have done this, we can put the cell data into a table we get:
Now we know which page relates to each set of key values. However if we look at our earlier query we do not see any key values. So where a primary key is not defined as part of a table, each record is given an autoincrementing number called rowid. We can see this by modifying our original query to view the sqlite_schema page to “select rowid,* from sqlite_master”. This gives us the following:
So now we can see that the record relating to the “albums” table is record 1. Looking back at our table, we can see that as the key value is less than or equal to 3, we need to go to page 6 of the database to find the data.
To navigate to page 6 of the database, we need to multiply the page size by the page number -1 (as Sqlite pages start numbering at 1). So in this case, page 6 is at offset 1024 X (6-1) = 5120. At the start of page 6, we can use the template header to decode it and get the following:
From this we can see a couple of things straight away. This is a page type 0x0D which is a table leaf page. If the table was bigger, we might have another interior page between the root and the leaf pages. In that case, we would decode the page as we did the root page.
Also we can see that this page has 3 records and this makes sense as the pointer from the root page was for values less than or equal to 3. If we jump to offset 506 from the start of the page, we will hit the first record.
Looking back at our table of record formats based on page type, we can see that for a 0x0D (Table Leaf) page the structure of each record is:
- Varint: Number of bytes
- Varint: RowID
- Bytearray: Payload
- 4-byte integer: Page for overflow page (not used).
In order to decode these records, we will need to understand varints
These are a variable length integer. They can be between one and 9 bytes in length, though commonly 1 – 3 bytes. It consists of zero or more bytes where the high bit is set and a single byte where the high bit is not set.
The high bit is the most significant, or highest value, bit in a byte. Where this is set, the next byte is part of the varint, if not it is the end of the varint
In hex, with only the high bit set, the hex value would be 0x80. Therefore any value >= 0x80 has the high bit set.
Consider the following data as a series of varint values:
0x45 0x81 0x23 0x83 0x01 0x82 0x81 0x34
If we start from the 1st value, 0x45, this value is less than 0x80 so the high bit is not set. This is a single byte varint.
We move to 0x81. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x23 and this is less than 0x80 therefore this is a 2 byte varint (0x81 0x23)
We move to 0x83. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x01 and this is less than 0x80 therefore this is a 2 byte varint (0x83 0x01)
We move to 0x82. This is greater than 0x80 and therefore the next byte is included. We move to 0x81. This is greater than 0x80 and therefore the next byte is included. The next byte is 0x34 and this is less than 0x80 therefore this is a 3 byte varint (0x82 0x81 0x34)
So at the end of this process we have:
- Varint 1: 0x45
- Varint 2: 0x81 0x23
- Varint 3: 0x83 0x01
- Varint 4: 0x82 0x81 0x34
Now we need to determine the actual values of these varints. To do this, we ignore the high bit and then calculate it as a normal integer. The easiest way to do this manually is the following (bytes are numbered right to left, so last byte of varint is byte 1):
Value = ((byte n – 0x80) * 0x80^n-1) + ……. ((byte 3 – 0x80) * 0x80 ^2) + ((byte 2 – 0x80) * 0x80 ^1) + ((byte 1 – 0x80) * 0x80 ^0)
Or in decimal and simplified for max 3 byte varint:
Value = ((byte 3 -128) * 128 ^2) + (byte 2 -128) * 128 + byte 1
So if we apply this to our varints above, we get the following values:
Varint 1: 0x45 / dec 69
Varint 2: (0x81 – 0x80) * 0x80 + 0x23 = 0x80 + 0x23 = 0xA3 / dec 163
Varint 3: (0x83 – 0x80) * 0x80 + 0x01 = 0x180 + 0x01 = 0x181 / dec 384
Varint 4: (0x82 – 0x80) * 0x80^2 + (0x81 – 0x80) * 0x80 + 0x34 =
0x02 * 4000 + 0x01 * 0x80^2 + 0x34 = 0x8000 + 0x80 + 0x34 = 0x80B4 / dec 32948
Decoding Table Leaf Records
Looking back at our data we can see the following as the start of the cell data
So our data starts with 0x82 0x24 0x01 0x07 0x17 and the documentation tells us that we have 2 varints, the first for record size and the second for record ID.
So using our above formula the record size is (2 byte varint)
(0x82-0x80) * 0x80 + 0x24 = 0x124 or decimal 292
The second varint has a value of 0x01. This is our record ID and this matches the record ID relating to the “albums” record. If it didn’t, we could check the other cells to look for the record we are interested in.
Immediately after our 2 varints, we have the byte array which is the record. Section 2.1 of the SQLite documentation tells us that, in a leaf page, the data is always in record format which consists of two parts.
The header starts with a single varint that defines the size of the header, including the bytes defining the size. This is followed by a number of header fields.
So immediately after our 0x01 second varint, we see a 0x07 byte. This is the length of the header, including that byte, which means our header data is:
0x07 0x17 0x19 0x19 0x01 0x84 0x23
The sqlite3 documentation provides a table detailing the different values:
|Serial Type||Content Size||Meaning|
|0||0||Value is a NULL.|
|1||1||Value is an 8-bit twos-complement integer.|
|2||2||Value is a big-endian 16-bit twos-complement integer.|
|3||3||Value is a big-endian 24-bit twos-complement integer.|
|4||4||Value is a big-endian 32-bit twos-complement integer.|
|5||6||Value is a big-endian 48-bit twos-complement integer.|
|6||8||Value is a big-endian 64-bit twos-complement integer.|
|7||8||Value is a big-endian IEEE 754-2008 64-bit floating point number.|
|8||0||Value is the integer 0. (Only available for schema format 4 and higher.)|
|9||0||Value is the integer 1. (Only available for schema format 4 and higher.)|
|10,11||variable||Reserved for internal use.|
|N≥12 and even||(N-12)/2||Value is a BLOB that is (N-12)/2 bytes in length.|
|N≥13 and odd||(N-13)/2||Value is a string in the text encoding and (N-13)/2 bytes in length. The null terminator is not stored.|
So examining the fields in order we get (all values are varints in a header):
0x17 = 23. As this is greater than 13 and odd, it is a string (last row on table). To find the length we take 13 off the value, giving 10, and divide by 2. So our first field is a 5 character string.
The next 2 fields are very similar, both values 0x19. Using the same logic, we get 2 6 character strings (0x19 = 25, (25 -13) /2 = 6)
Our third value is a 0x01, which translates to a 1 byte integer.
Our last value is a 2 byte var int, compared to the single byte varints previously, so we first need to decode the varint value.
0x84 0x23 = (0x04 * 0x80) + 0x23 = 0x223 = 547 decimal
This is greater than 13 and odd so we know it’s a string. (547-13) / 2 = 267. So the last field is a 267 character string. This gives us our fields as :
- 5 character string (0x17)
- 6 character string (0x19)
- 6 character string (0x19)
- 1 byte integer
- 267 character string.
Adding up the lengths of these fields gives us a total size of 5+6+6+1+267 = 285.
If we add the header length to this value (7) we get a combined length of 292 and that matches what we originally decoded from the 1st varint as our record size.
Immediately following that are the data values in order, each field has been highlighted in a different colour for ease of viewing
Here is where we can use the SQLite Record Decoder X-Tension to confirm our results. If you are using the VHD file in X-Ways, change the view to Disk or Partition view and select the first 2 or 3 bytes of the record. The run the X-Tension from the toolbar at the top of the screen. This will decode the data and highlight the whole record for you. The data is shown in the log window. Using the Sqlite record X-Tension we can confirm our results for this record (output of x-tension):
Offset 506 of page (file offset 5626).
[XT] Record ID:1 Record Length:292 Header Length:7
[XT] Record Data:”table”,”albums”,”albums”,”2″,”CREATE TABLE “albums”
[AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Title] NVARCHAR(160) NOT NULL,
[ArtistId] INTEGER NOT NULL,
FOREIGN KEY ([ArtistId]) REFERENCES “artists” ([ArtistId])
ON DELETE NO ACTION ON UPDATE NO ACTION
Looking at the sqlite_master table that we saw previously, we can see how this matches the records (although you cannot see the data after the first newline)
The 1 byte integer field in our record is the rootpage for that table. This rootpage is the start of the data for this table and can be interpreted using the same methods we have used to decode the data in the root page that was located at offset 100 of the database, immediately after database header.
In order to decode the data for the “albums” table, we would got to page 2 (offset 1024) and decode the page header there. If it was an interior page, we would have to map out the other pages associated with it, if it was a leaf page then we could extract the records from this page.
I’ll leave the post here and leave it as an exercise for you to decode the records for the album table. You can use and SQlite browser to check your answers. Hopefully this has given you a grasp of the fundamentals of how SQLite databases stored data.
In future posts, we will look at how we can use this information to recover deleted records.