Vacuuming the Digital Trail – Part 2

Vacuuming the Digital Trail – Part 2

In part 1 of this series on SQLite Vacuuming, I discussed how vacuuming works in SQLite database and how it is triggered.

In this article, I will discuss the technical details of the Auto_Vacuum feature and how to determine if it is enabled.

I will also breakdown the structure of pointer map pages and how they can be used in certain scenarios.

How can we determine if Auto-Vacuum Enabled

 

As I discussed previously, when auto_vacuum is enabled, at the conclusion of a transaction Freelist Pages will be truncated. To determine if this feature is enabled, we must look at the database file header.

Offset 52 in the file header is a 4-byte Big Endian integer that denotes whether auto-vacuum is enabled. If the value is 0 then auto-vacuuming is disabled (what we prerfer to see), if the value is non-zero then this will tell us the page number of the largest root B-tree page.

Using the KnowledgeC database as an example, it has 00 00 00 2C in the 4 bytes from Offset 52. As it is a non-zero value it tells us that page 44 is the largest root b-tree page, but more importantly it tells us that auto-vacuuming is enabled.

    Damien Attoe

    Damien Attoe

    Before joining Spyder Forensics, Damien was a Managing Consultant at AccessData where he managed eDiscovery and digital forensics projects and provided services to companies in various industries including the Health Care, Energy, and Financial industries. Prior to that, Damien was a Computer Crime Specialist at the National White Collar Crime Center where he conducted digital forensic research and performed software validation studies on digital forensic software

    Bonus Knowledge: The largest root b-tree page refers to the root page in the database that has the largest cell content area. In my example, page 44 is the root page for the Z_MODELCACHE table and is a leaf page so it has more data written to it than the interior pages that we would see for the tables that have a large number of records like ZOBJECT or ZSOURCE.

    How about the auto_vacuum mode?

    We have determined that auto_vacuum is enabled, but how can we tell what vacuum mode is being used? This is where we must look at the 4-bytes from offset 64 in the file header.

    A value of non-zero in these 4 bytes indicates that incremental vacuuming is enabled. A value of 0 indicates that incremental vacuuming is disabled, meaning the auto_vacuum mode is FULL.

    In this example, the SQLite database has 00 00 00 01 in the 4-bytes from Offset 64 which indicates that incremental vacuuming is enabled.

    Bonus Knowledge: From what I have observed, this vacuum setting is utilized in some of the iOS SQLite databases that we typically look at during an examination, such as NoteStore.sqlite, CallHistory.storedata, healthdb_secure.sqlite etc.

    As incremental vacuum is enabled, there is still the possibility that Freelist pages exist. In this example, the 4-bytes from offset 32 tell us the first freelist trunk page is on page 6914 and the 4-bytes from offset 36 tell us there are a total of 720 freelist pages (includes both Trunk and Leaf pages).

    KnowledgeC is a great example of a database where even though incremental vacuum is enabled, there could be some freelist pages still present. If we consider that the blob data in the Z_DKINTENTMETADATAKEY_SERIALIZEDINTERACTION field in the ZSTRUCTUREDMETADATA table can span multiple pages, when it “expires” it can create a large number of freelist pages. If new entries are not inserted into the database fast enough, you end up with freelist pages either waiting to be reused or truncated as new transactions occur.

    How many pages are truncated when incremental vacuum is enabled?

    Unfortunately, when incremental vacuum mode is enabled, the number of pages to be truncated is not stored physically. The number of pages is set using a PRAGMA statement which is executed when a database connection is established. The only way to determine this information is by examining the source code for the application which for the most part is not publicly available.

    Pointer Map Pages

    When auto_vacuum is enabled, Pointer Map pages will be present in the database. These are a special type of database page that tracks the usage of pages within the database making the vacuum process more efficient when it comes to moving pages around.

    The first pointer map page is always page 2 of the database and then depending on the number of pages in the database there may be additional pointer map pages. Pointer map pages contain an array of 5-byte entries that provide information about the subsequent pages up to the next pointer map page.

    To calculate the positioning of the subsequent pages we must calculate the number of entries that can be stored on the page based on the page size. As each entry on the page is 5 bytes in length, we can do some math to figure out where the next page is.

    Using the calculation J=U/5 where U is the page size we calculate the number of entries on a page.

    (documented on sqlite.org: https://www.sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages)

    If we have 4096-byte pages: 4096/5 = 819.2 so if we round down, there are 819 entries that can be stored on a page which provide references to the next 819 pages. Which are database pages 3 through 821.

    Now that we know the number of entries that can be stored in 1 page, we can run another equation to calculate the page number for the subsequent pointer map pages.

    The calculation would be: (J * C) + 2 + C where C is a pointer counter that starts at 1 and the 2 is static and refers to the first pointer map page.

    Using the example of 4096-byte pages, the calculation for the second pointer map page would be (819*1) + 2 + 1 which give us page number 822.

    The third pointer map page would be (819*2) + 2, giving us page number 1642 and so on and so forth till the end of the database file.

    Using the KnowledgeC database we have been using as an example of a database that utilizes auto_vacuum, if I programmatically calculate the Pointer Map page numbers I get:

    Page Content

    As I mentioned, pointer map pages contain an array of 5-byte entries that provide information for the pages that come after it until the next pointer map page.

    The general structure is below:

    The page flag is a 1-byte big-endian integer that tells us what type of page it is. This can be 1 of 5 different values: 

    Page Flag

    Page Type

    1

    B-Tree Root Page

    2

    Freelist Page

    3

    First Overflow Page

    4

    Overflow Page

    5

    Non-root B-tree Page

     

    The remaining 4-bytes is a big-endian integer that references a page number. This meaning of this page number varies depending on what type of page it is.

    Page Type

    Description

    B-Tree Root Page

    Always Zero

    Freelist Page

    Always Zero

    First Overflow Page

    Page Number for the B-Tree leaf page that contains the first part of the record that has overflowed

    Overflow Page

    Page Number for the previous page in the overflow chain

    Non-root B-tree Page

    Page Number for the B-Tree Parent Page (Interior)

     

    Reverse Lookup Table

    Based on the information stored on these pages, this could be considered a reverse lookup table. When we think about SQLite B-trees, Overflow chains and Freeblock chains, you can navigate forward, but not backwards. For example, to navigate a B-tree you decode the interior page to get to the next level, but once you reach that next level there are no pointers to get you back. With the information that is found on a pointer map page you can navigate backwards and see what table a specific B-tree leaf page belongs to.

    If we take a look at Page 4102 in my KnowledgeC database (6th Pointer Map page), the first entry is 05 00 00 02 6D. Decoding this entry, 05 is a non-root B-Tree page, and the referenced page number is 621 which is the pointer to the interior page in the previous layer of the B-tree.

    As each entry in the array refers to a page that comes after that specific pointer map page; to calculate the page number, add the entry number to page the number for the pointer map page.

    Our pointer map page is page 4102, and the entry is number 1, therefore this entry is the reference for page 4103.

    Looking at page 4103, the page flag 0A means it’s a B-tree Leaf Index page, the first freeblock is at page offset 3414, there are 113 cells written to the page and so on and so forth.

    If I wanted to figure out what index this page is a part of, I can do reverse lookups using the pointer map pages. The entry for this page told us the parent B-tree page was page 621, so we need to find which pointer map page contains the entry for this page. As there are 819 entries per pointer map page, the entry for the page we are looking for would be on the first pointer map page which is page 2 in the database.

    Decoding page 2, page 621 would be entry number 619. To calculate the exact position on the page for the entry, take the entry number – 1 and multiply by 5. In our example it puts us at page offset 3090. Decoding the entry, page 621 is a Non-Root B-tree page, and the Parent Page is Page 37.

    As I haven’t found the Root page for the index yet, I have to go to the pointer map entry for page 37, which would be the entry number 35 of page 2. 35 multiplied by 5 gives us page offset 175.

    Decoding the entry, the page flag tells us that page 37 is a B-tree root page.

    Now it’s just a case of querying the sqlite_master table, to figure out the index name. In this example, we can run SELECT * FROM sqlite_master WHERE rootpage = 37

    This returns the index with the name  Z_Event_compoundIndex1.

    Naturally you wouldn’t want to do this manually, but by understanding the structure you can programmatically do a reverse lookup to determine what table or index a specific page belongs too.

    Mapping an Overflow Page back to Original Record and Table

    Another use-case for pointer map pages would be mapping an overflow page to a specific record and/or table. Often, in my examinations I do text-based searches at the hex level to find data that my forensic tools potentially missed or to validate findings. This often results in getting hits on overflow pages. As overflow pages form a chain, I can use the pointers to find any subsequent overflow pages, but I can’t go backwards and determine the overflow pages that came before it in the chain and which record and table the overflow page belongs to.

    Let’s say I encountered a keyword hit on SQLite page 1011 and I have determined it is an overflow page based off the 4-byte pointer at the start of the page. In this example, the 4-bytes are 00 00 00 00 which tells me there are no additional overflow pages for this record.

    By doing a reverse lookup through the pointer map pages I can determine the B-tree page that contains the record and figure out which table it belongs to.

    Page 1011 is identified as an Overflow Page on the associated pointer map page,  with the previous overflow page for the record being on Page 994. I can go the physical offset for the start of Page 994 and confirm the first 4-bytes point to page 1011. Hex 00 00 03 F3 is big endian 1011.

    Based on the pointer map page, page 994 is the First Overflow page, with the parent B-tree page that contains the first part of the record being page 2794. I went to the page, and looked at the cells and only 1 cell had overflow. Decoding the last 4 bytes of the cell it pointed to page 994.

    Whilst I am not going to go into manually decoding B-tree table leaf cells (topic for another day), I was able to determine the RowID for the cell is 346807 as shown below. There is a total of 238 Record Keys in the Payload Header (Yellow) which means that it is a pretty substantial table as far as columns. Then when looking at the Record Keys, Record Key 224 is a 3-byte varint, and when decoded is a blob field that has 8461 bytes of data which was the cause of the overflow.

    I have found the id for the record, now I need to know which table it belongs to. The pointer map entry for page 2794 tells me the parent B-tree page is page 1438, which is a non-Root B-tree page. If you understand how B-trees work, leaf pages cannot point to other leaf pages, so the parent would be an interior page in the above layer in the B-tree.

    We can look at the page flag at the start of the page, and Hex 05 confirms that page 1438 is a B-tree table interior page.

    We can go a little bit further and decode the cell content area to confirm that page 2794 is referenced. Once the cell is identified, the first 4-bytes are 00 00 0A EA which is page number 2794, and the highest key stored on that page is stored as 3-byte varint which when decoded is 346813.

    Back to the pointer map pages, the parent for page 1438 is page 15, which is a B-tree Root page. Now we have the root page number we can determine the table the record is associated with, which happens to be the ZSTRUCTUREDMETADATA table.

    We can query the ZSTRUCTUREDMETADATA table for id 346807 and look at column 224 to review the BLOB data that was on the original overflow page. This happens to be the Z_DKINTENTMETADATAKEY__SERIALIZEDINTERACTION column which contains a binary plist encoded using NSKeyedArchiver.

    Final Thoughts

    As I have discussed, auto_vacuum does provide some challenges in the recovery of data from freelist pages and depending on the mode used, you will either encounter no freelist pages, or a small amount.

    In regard to Pointer Map pages, although they don’t contain any user or application data, they can be useful to do reverse lookups if you are performing searching at the physical level. You can figure out which table a page belongs to, and in the case of an overflow page can help you to trace back to the record.

    As you can probably imagine, if you are able to identify a pointer map page in a corrupted or partial carved SQLite database, you could potentially rebuild the database even if you don’t have the full database schema or all the b-tree interior pages.

    In the next and final article in this series about SQLite vacuuming, I will discuss the Vacuum command in more detail, including some examples of how some applications implement regular vacuuming of their SQLite databases.