In our last episode, we successfully created ARM64 Windows shellcode that pops calc.exe. However, as just a proof of concept, we hard-coded the address of
WinExec() for the current boot of our test VM. But we can do better. Modern shellcode determines the addresses of functions rather than hard-coding them. This allows for compatibility with ASLR.
Our starting point
Here's what we left off with last time:
The relevant part is this:
This is what we're going to look at.
Dynamically finding WinExec()
We can see some existing writeups about finding kernel32.dll's address, albeit for x86-based architectures:
We'll use this as a starting point, but we may need to change things for ARM64.
From the latter, we can see that the usual chain of memory structures to get to the kernel32.dll address is:
Getting the TEB on ARM64
On x86, we get the TEB by looking at
FS[:0x30]. But we're on ARM! This isn't a thing on this platform. Looking at the Windows ARM64 ABI documentation, we can see that to get the TEB, we can just look at the
We can confirm this in Windbg (attached to msedge.exe for example, but any should work):
Here we can see that the
0x60 will get us the
PEB . So let's start out our shellcode with this:
Getting the image base address
We can look into the PEB in Windbg:
Here we can see that the PEB LDR_DATA is at the PEB + 0x18. We can just keep re-using the x27 register until we get something we may end up needing to use in it. So append to our shellcode:
Back to Windbg:
Here we can see that 0x10 into the PEB LDR_DATA structure, we have
InLoadOrderModuleList . Just to confirm what we're looking at, let's look at the bytes that x27 points to. Note that we cannot just dereference the x27 register in our debugger session, as the dmp file contains the state of the machine when the crash was captured, as opposed to when the crash occurred. So we have to scroll up the
!analyze -v output and copy/paste the register value we are interested in.
Here we can cross-reference the output from the
dt command to the actual bytes in memory:
Similar to the above, we can use a windbg technique to use the
dt command, but to have it parse the live data to supplement it:
As expected, we can see that the values match up with what we derived from the memory byte values. Whee! We can get even more data by clicking on e.g. "InLoadOrderModuleList":
Navigating the InLoadOrderModule list in windbg
We currently have addresses of doubly-linked lists that contain information about loaded modules. Let's look at the
The way we do this is we first get the value at
0x10 into the LDR_DATA structure, which is
0x18 into the PEB:
How that we have that address, we can have windbg parse the data (as LDR DATA_TABLE_ENTRY), but using the specific data at 000002b4`696046d0:
Here we have clear evidence that the first entry in the InLoadOrderModule list is the process itself (msedge.exe).
Because we are dealing with a linked list, we just need to do one more dereference to get the information of the next item in the list. In this case, one more level of
poi() in windbg:
OK, now we see that we've got
ntdll.dll . We're close!
Let's try one more:
Bingo! At least in our Windbg session, we've got the address of kernel32.dll, and we know how we got there (Look at the 3rd entry in the
InLoadOrderModuleList linked list, and go to the offset of
0x30 within that entry.
Navigating the InLoadOrderModule list with our shellcode
For any given LDR_DATA_TABLE_ENTRY, the DllBase (where the DLL is loaded into memory) is
0x30 into the structure. Given my memory of the development, and having used multiple guides in the process to get what I wanted, I'll have to be a little bit hand-wavy here. But here's the sequence of instructions that we can append to our shellcode to get the base address of kernel32.dll:
At this point, we have set register
x28 to be the load address of kernel32.dll.
To visualize how we got there, this diagram may help:
Getting the kernel32 PE header
First we need to look at the DOS header to find the offset of the PE header. We can see this visually in 010 editor, and probably other tools:
Here we can see that
0x3C into the DLL file, there is a value: 0x38. This is the offset into the binary file where the PE header begins.
So in our shellcode:
Getting the export table
We need to figure out how much further beyond the PE header we need to go to get to the export table.
We can see that it's
0x170 from the beginning of the file (
0xe8 + 0x88 ) from the beginning of the file. The tutorial I was looking at said that it should be
0x78 from the PE header, though. Why the difference? Look in the screenshot above... there are 4 size fields. And on a 64-bit platform, these sizes will eat up
0x10 more bytes than on a 32-bit platform!
At this point, we know what the data directory offset (where the Export table is at the beginning) is. We do a little math to get the actual address of the export table:
Getting useful pointers within the export table
To do some math to get our function address, we'll need several relative virtual addresses (RVAs) saved into registers. I chose them based on how they're laid out in windbg. ARM gives us lots of registers to work with, so we don't need to be very conservative here. What we care about are:
- Name pointer table
- Address pointer table
- Ordinal table
More useful than the RVAs would be the actual addresses of the tables, so we do some maths:
Looking for our function
Now that we have programmatically determined the locations of our various function tables, we can start looking for WinExec(). We will simply look for a match for a function that begins with (case-sensitive) "WinE". So first we will set up x20 to have the needle we're looking for.
And because we'll be looping, I'm going to pre-subtract some values so that the loop incrementing works as expected. This seems really dumb, but it works.
Now we prepare the main loop structure. The idea is:
- Increment the function number counter
- Increment to the next name in the list
- Load the first 4 bytes of the function name into a register
- Compare vs. the needle we're looking for ("WinE")
- If no match, loop again
Or in code:
At this point, we have the function number of WinExec(). In my testing, it's
0x622 in the kernel32.dll in my ARM64 Windows VM.
Getting the WinExec() ordinal number
In many cases, the function number and ordinal number are the same. But it's not guaranteed. So let's do the right thing:
We need to multiply the function number by 2, and then look at that offset within the Ordinal table to get the actual ordinal. It should be the same, but by doing this step our shellcode will be more universal.
Getting the WinExec() function address
In a similar manner to getting the ordinal number from the exported function number, we can get the function virtual address from the ordinal number. In particular:
Multiply the ordinal number * 4, and then go that offset into the address table.
At this point we have the RVA of WinExec(), but we want the absolute address. So we do some math.
Confirming the address of our found WinExec() function
We can compare our old code:
This is a hard-coded
0x00007ffc6fde9ff0 for our currently-booted VM.
We can put our "crash" widget in at this place and then look at the dump file. When we run !analyze, we see:
Success! The address we found is the same (in this boot) as what we were looking for. What this means is that our PoC should still work across reboots, and across ARM64 windows instances that have the same vulnerability.
This diagram should help to visualize the process of getting from the kernel32.dll base address to the address of WinExec():
Specifying our calc.exe payload
This is similar to our prior PoC:
The complete ASLR-compatible PoC