Intro
Part 2 posted here
A few weeks ago I had the opportunity to present at CEIC 2015, which ended up being a great experience. All software opinions aside, any situation where there are multiple forensicators in a room — advancing ideas, asking questions, and (most importantly) learning is a plus as far as I’m concerned. During my talk there was a lot of great back and forth with the audience members and a few thought-provoking questions. While the following content has been on my mind for a bit, I’m glad to hear others are interested in seeing the results.
That being said, I’m still researching this topic. I’m guessing this series will end up being multi-part, but for now, I wanted to get some initial thoughts written down. If this is of interest, please look out as I continue more research into a small area of SQL database forensics. Here’s a few upcoming topics:
- Recovering in-row data from memory (current post)
- Recovering additional artifacts from memory
- On-disk SQL artifacts
- SQL Server in Windows Event Logs
Ok, on to the content!
SQL Data Pages
During my talk, I spent some time talking about SQL data caches. In short, data caches are SQL data and index pages stored in memory for faster data retrieval. As databases have progressed, there is a lot more reliance on in-memory operations; which makes, for us forensicators, another fun artifact to carve out of memory! Or, for the adventurous malware author..yet another fun artifact to carve out of memory :(
Now, being memory-based artifacts, these are of course some degrees of volatility. It depends on how often the data is being accessed/updated as well as how many tables you’re traversing across. If you have a solid DB with only a few, massive, widely-used tables, there’s going to be a good deal in memory. If you have multiple DBs with dozens of tables all being accessed at random points, the data caching may look a bit different.
In any event, knowing what we know about SQL operations, cached data pages can tell us a few things:
- Data caches stored in memory reflect recently or frequently accessed SQL data. If we can get close enough to unauthorized access to a DB, we may be able to determine what was accessed.
- We can access SQL data without interacting with our DB — yet. By nature, interacting with the DB may change data cache optimization, potentially overwriting other memory artifacts. Order of volatility considered here, capturing memory may come before capturing other volatile SQL artifacts.
- Worst case scenario where our SQL server is corrupted and no other artifacts exist; data caches may give us some view into what data was available.
Disclaimer
First, let me define the test environment. I installed MS SQL Server 2008 R2 Express on a test Windows 8.1 box with only 2GB of RAM. Let me state it: This is NOT an ideal, nor “real world” SQL setup. I’m not using a server OS. There is typically much more RAM allocated as some DBs can be absolutely massive. Moreover, the tables I am accessing don’t have too much data in them. So I don’t need to traverse multiple pages in memory, or link row overflows.
Again, I’m looking to perform artifact research.
Data in the DB
Let’s start by examining our data in the database, so we can see how our data sits in the data page, and thus in the data cache. I’m using Microsoft’s sample test DB for this, AdventureWorks2008R2. Examining the Person.Password table (this data might be of interest, right?):
Use AdventureWorks2008R2;
SELECT * FROM Person.Password
Again, this is a test DB.
We can see some standard information in here; password hash, a business entity ID (primary key as well, most likey), guid, date, and the password salt. This screenshot may show only 9 rows, however this DB has 19,972 records total. Lots of data to be cached if we’re hitting this DB continuously! Now, this is only one way to look at our data. Let’s use some internal SQL commands to take a look at the data page, on disk, related to this particular table. The DBCC command, which stands for DataBase Console Command(s), can give us some introspection.
The IND command will help lay out page numbers used by tables. Remember, we’re still looking at the disk right now:
DBCC IND('AdventureWorks2008R2','Person.Password',-1)
Nice! Notice we have 237 rows, aka we have 237 in-row data pages. That’s an average of 84 rows of data per in-memory page. This number is important, and will line up in future posts. We want to begin with the PagePID with a PageType of 1. This indicates actual in-row data. Let’s pick the first page of the four-digit sequences, and examine the data within. The PAGE command will examine the raw data within the selected page (3872 in this case, and will let us see how the data is represented in the page).
DBCC PAGE('AdventureWorks2008R2',1,3872,3)
The screenshot above is the page header — but look at all that lovely hex! I’ll break a lot of this information down in the next post, but let’s get to what we really want to see. Scrolling down:
What we’re looking at here is the hex, and somewhat ASCII, representation of our first record. Scroll up if you want to compare; or look at our data. See the Slot 0 at the top? This signifies the first record in the table. Luckily this is presented to us in sequential order. What we’re interested in is the bottom portion of the last screenshot: the “meat” of our data, if you will. Don’t get me wrong — the other stuff is very important. However, for this post, we’re going to go deep and then come back up for air later.
SQL Row Structure
Before we get into memory recovery, let’s understand how our SQL data is structured. Again, focusing on the blob from above. Here’s a refresher:
3000200001000000BEAC9E3283C8484FB8B617AA4627EFFF000000001A930000050000020055005D00706246775857453939766F625436672B765057467939334E7455552F6F727249576166463031686363664D3D624533586957773D
Lol..we can make that look better. Let’s break down this structure (don’t forget the endianness, where needed!):
- Status Bits A (1-byte length): 0x30
- In short, this tells us that our table has a NULL bitmap (default in SQL Server 2008), and that variable-length columns exist in the data. There’s a more detailed explanation of this, but not getting into it right now.
- Status Bits B (1-byte length): 0x00
- Indicates whether the record is a ghost forwarded record. Not the case in this instance; Status Bits A would be different too if this value was 1.
- Fixed Length Size (2-byte length): 0x2000
- Length of fixed-length data. Note that fixed-length is stored/addressed prior to variable length. Rows are not stored in “data order”.
- This value also tells us where to find the number of columns; which is stored at the end of the fixed-length data portion.
- Fixed Length Data:
- Business Entity ID: 0x01000000
- rowguid: 0xBEAC9E3283C8484FB8B617AA4627EFFF
- ModifiedDate: 0x000000001A930000
- Number of columns (2-byte length): 0x0500
- 5 columns, which lines up with our expectations
- NULL Bitmap (NColumns/8 Byte Ceiling): 0x00
- 5/8 < 1, therefore we only have a 1-byte length here
- No columns are NULL, which also lines up with expectations
- Number of variable length columns (2-byte length): 0x0200
- This makes sense, as we have 3 fixed length columns, and 5 columns total. 5–3 = 2.
- Variable length column end (2-bytes * # of variable columns):
- 4 bytes in this case (2 variable length columns, 2 bytes each)
- Variable Column 1 Offset: 0x5500
- Variable Column 2 Offset: 0x5D00
- Variable Length Data:
- PasswordHash: 0x706246775857453939766F625436672B765057467939334E7455552F6F727249576166463031686363664D3D
- PasswordSalt: 0x624533586957773D
Phew, what a mouthful! Ok, let’s reflect on a few points from above:
- I used my knowledge of the row mapping to map data to columns. In actuality, the data page does not contain row names. We may have to construct these from somewhere else.
- An array is provided that gives the beginning of each variable length column data; we are not so lucky with the fixed-length data. We can use the number of columns field to narrow this down, however without knowledge of structure, we may “split” the data incorrectly.
- You can browse between the screenshots above to compare the hex to the actual data content, or simply render in a hex editor/conversion tool.
And perhaps, the most important takeaway:
- Albeit dependent on row structure, there is some form of SQL data structure here! We have a row “header” that seems to correspond to primary in-row data: 0x3000. Furthermore, if we know a bit of structure about our rows, we can drop in the next 2 bytes, and include the fixed row length: 0x30002000. While this is certainly not a unique-to-SQL-only set of characters, it may pose a start to carving records from SQL.
But let’s say I’ve got access to a memory dump, and I want to backtrack how my data looks in memory. Or, let’s say I’m a nefarious actor, and I want to look for a particular string of interest, and then pull back other DB records.
SQL Data Cache in Memory
To begin analyzing how the data shows up in memory, I first took a memory dump of my test box immediately after running the commands above. This allowed them to be fresh in memory, and void of any default flushing policies.
Let’s stick with the first record. I’m going to start with small subset of the PasswordHash entry, which should be fairly unique enough to not throw too many false positives: 70624677585745393976. Searching through my memory dump in hex, I get quite a few hits; three to be exact. Let’s use our value of interest to back out what we’re seeing stored in memory:
Hey now, check out this goodness! Looks like we were able to go from one to multiple rows of data cached in memory, all in sequence. And knowing a bit more about the structure, we can see that the headers go in order of the rows as well:
- 3000200001
- 3000200002
- 3000200003
- ..and so on. This particular chunk of memory went to record 2A.
Not only are we able to see the rows in sequence, but the base64 encoded data, as well as the salt, are stored in plaintext for us to scrape out if we want. And even if the base64 wasn’t so readable, we have the values we need to find out the variable column lengths too.
Knowing what I know about the structure, I can kick off a carver for 300020000?, and expect to pull back records from this database. Or, I could do some manual spot searches to see how many records I can pull back; constantly upping the size of the BusinessEntityID1 field, seeing how big our table is.
Note that the 0x30 in the front of our row can change; we’ll discuss that format in later posts.
Takeaways
So, what happened here? Did I simply look up some data in SQL, and then perform a search through a memory image? Could we not have achieved the exact same thing by running strings? Sure, if we knew what data we were looking for. Here’s our output, where I grepped for a substring within our PasswordHash:
$ strings memdump.mem | grep -n orrIWafF0
8918871:NtUU/orrIWafF01h
8918877:PasswordHash = pbFwXWE99vobT6g+vPWFy93NtUU/orrIWafF01hccfM= 9411031:pbFwXWE99vobT6g+vPWFy93NtUU/orrIWafF01hccfM=bE3XiWw=0 10340362:pbFwXWE99vobT6g+vPWFy93NtUU/orrIWafF01hccfM=bE3XiWw=0 12420618:pbFwXWE99vobT6g+vPWFy93NtUU/orrIWafF01hccfM=bE3XiWw=0
In this case, we happened to get what we wanted, which was a PasswordHash. However, memory allows us to go even further. Knowing the SQL data cache structure let’s us build out a more robust memory carve.
This exercise instead focused on using what we know to unravel what we don’t know. As mentioned earlier, SQL databases are constantly relying on more in-memory operations. And as memory has undoubtedly become one of the most important artifacts to be collected, it’s important to understand what else may be held within. In future posts, I’m going to take a memory-only approach, seeing what we can build knowing data structure. Who knows, might even be a plugin extraction tool in the future :)
Thanks for reading, more coming soon!
Originally published at www.505forensics.com on June 9, 2015.