This blog post should give you a rough insight into the implementation of the Mario Kart 8 exploit to be a primary entrypoint for homebrew. Thereby, both the technical details and the problems that came up during development should be discussed. This time I also want to tell you about the ideas that didn’t work, instead of just the one that works fine.
In the beginning of this year Rambo6Glaz made just another implementation of the GX2, which uses a different a different PM4 packet to manipulate the kernel heap. This new implementation brought back the idea of implementing a kernel exploit inside a rop chain.
- ROBChain, an exploit in the main character scripting of Super Smash Brothers Wii U
- an exploit in the network protocol of Mario Kart 8
- a savegame exploit in Donkey Kong Tropical Freeze.
But there is a problem with all of these exploits: None of them has access to the access to the JIT-area. This mean no access to a area in memory which writeable and executable. This make arbitrary code execution without a kernel exploit impossible.
Out of these exploits the Mario Kart 8 one is special. It can be run on a previously unmodified console and could be a potential primary entrypoint into the system. Because of this the focus went to the Mario Kart 8 exploit.
Exploiting the network protocol of Mario Kart 8
Back in 2018 Kinnay found a bug in the P2P protocol of Mario Kart 8. He released a PoC which could crash the console of someone who hosted a friend room displaying the message “rop chains are fun :)”. This initial implementation allows a (remote!) rop chain execution with maximum length of ~1000 bytes, more than enough to play around which different payloads.
The original repository
has detailed information about the exact bug and exploitation.
In summary it’s possible to achieve a 4 byte arbitrary write due to a bug in
parsing the “identification token”. That’s enough to manipulate a
turn a call of
Md5Context::GetHashSize into a memcpy to the stack,
effectively copying the content of another packet onto the stack, leading
to a rop chain execution.
The kernel exploit in a rop chain - theory
In theory implementation the kernel in a rop chain doesn’t sound that hard. From the wiiuhaxx-common repository we already have rop gadgets we can re-use. This includes for example gadgets to call a function or write a value to a arbitrary address in memory. Detailed information about the kernel exploit can be found in part 4 of my “homebrew environment” blog series, but here is a quick overview:
- Place a fake heap entry into a specific address in memory
- Create a PM4 packet and send it to the GPU to override the “next id” on the kernel heap
- Register an
OSDriverand hope it’s allocating memory using our fake heap entry placed in step 1
- Manipulate the “SaveArea” pointer in the
OSDriverstruct (which is now in userland memory) to point into the kernel data.
- Use the
OSDriver_CopyFromSaveAreafunctions to get arbitrary read/write with kernel privileges.
This doesn’t really seem that complicated. It’s just a few function calls and it fits relative easy into a 1000 byte rop chain. We also have the advantage that the address of the stack is consistent. This allows us to place data (like the fake heap entry or the pm4 packet) at the end of the rop chain and simply calculate their positions in memory beforehand. Rambo6Glaz talked about starting to implement the kernel exploit in the Mario Kart 8 exploit, and I thought I would give it a shot too. Using the existing gadgets and already knowing the kernel exploit in detail made me think this would be rather trivial and it would be done in maybe a few hours.
I started to play around with the exploit and tried to implement the kernel exploit step by step. Sometimes I had random crashes and testing was quite annoying. For each try you have restart the console, go online, open a friend room and send the payload. Then maybe also read the crash log by restarting again and firing up a CFW to access the crash log. In total each attempt took like at least 2-3 minutes.
Fast forward a few days. After many hours of testing and trying I still had nothing. But somehow this whole exploit was quite addicting, it started with one simple idea and ended (or didn’t end) everyday with “just more one try”. At the same time Rambo6Glaz was doing the same thing. Slowly but steady we got a better understanding of whats going on. Eventually we got a working memory write using the kernel exploit, but something was still wrong. It turned out that the exploit was indeed sometimes working (or at least partially working), but only in like 20% of the tries. This made testing even more annoying. Each idea required at least 5 failed attempts to make sure the idea was wrong and it wasn’t just the exploit randomly failing.
At this point we had collected some facts that helped us understanding:
- the kernel exploit did sometimes work, but only in rare cases
- the rop chain need to have a specific length to be stable (otherwise you get really strange behaviour and crashes)
- the rop chain is running on core 2, but the main GX2 core is 1 (the kernel exploit expects to be run on the GX2 core…)
For me personally a unstable exploit was enough, I just wanted to finish this. Even if performing the exploit would require several attempt, I just wanted to saw it working once, so I can finally spent my time on other projects.
Because of this I tried to split up the exploit into multiple rop chain which need to be executed one after another. Fitting the kernel exploit in 1000 bytes is doable, but also bundling a real payload and copy/executing won’t fit anymore. One challenge was to actually restart the game. But from reverse engineering for HID to VPAD I knew there was a function to force opening the home menu (`OSSendAppSwitchRequest), and it indeed worked.
I also tried to improve the success rate of the kernel exploit by adding
some waiting. But every time I waited via
OSSleepTicks or added a
GX2DrawDone the console crashed. Knowing the kernel exploit would work
only in rare cases I tried to think of a
solution to give the user feedback if the exploit was successful or has been
failed. In a rop chain “code execution” is really limited, it’s only possible
to run existing chunk of code. Branches and loops are really hard (at least
I haven’t found a way yet to pull it off, I am not a rop chain expert though),
the only option I saw was to manipulate the rop chain itself. I placed a
OSFatal at the end of the rop chain to make the console crash, but overriding
it with a
OSExitThread using the (hopefully) newly gained kernel write. This
way exiting would mean success and crashing would mean failure. I spent again
too much time on this but never really anything working.
At this point more than a week and literally dozens of hours were already wasted on this, without much progress. It was time to change the strategy. Rambo6Glaz suggested to find rop gadget to perform stack pivot to somehow has the possibility to execute a bigger rop chain.
Rop chain basics
Before working on this I’ve been working with rop chains, but I haven’t found
/written any rop gadgets myself. I wasn’t really understanding rop chains, I
was just using the “high level” functions from the
so it was time to dig deeper and learn something new.
(If you already are familiar with rop chains you can skip this part.)
Why do we need to use a rop chain? On the Wii U no region in the memory is executable and writeable at the same time (except for the JIT area, but we have no access to it in Mario Kart 8), so the idea is to use existing code. If you can control the stack, you can control the code flow. When calling a function, the position in the code of the “calling function” is saved on the stack. When ever the called function returns, it jumps backs to address which was saved on the stack. By manipulating this return address it’s possible to jump anywhere in the code. Using clever places in the code it’s possible to chain multiple of these jump to execute needed instructions.
Functions which use the stack to store local variables have a common pattern. At the end of the function they are loading the saved return address from the stack and increase the stack pointer. By carefully crafting a stack we can jump to parts of code that are written directly before this pattern. Each address which you jump to is called a “gadget”.
Let’s imagine a stack where currently the stack pointer (r1) is pointing to address 0x20000000:
# Stack before running the gadget: 0x20000000: 0 <-- Stackpointer (r1) 0x20000004: 0x10000000 <-- current gadget address 0x20000008: 0 <-- Stackpointer (r1) + 0x08 0x2000000C: [NEW GADGETADDRESS] <-- Stackpointer (r1) + 0x0C
Now we assume that some function was just returning, setting the stack pointer
0x20000000 and reading the address where to jump to from
This means at this state the code flow continues at
r1 = 0x20000000
# Intructions of the gadget in 0x10000000 0x10000000: [SOME USEFUL INSTRUCTION 1] 0x10000004: [SOME USEFUL INSTRUCTION 2] 0x10000008: [SOME USEFUL INSTRUCTION 3] 0x1000000C: lwz r0, 0xc(r1); # load return address from stackpoint + 0x0c 0x10000010: mtlr r0; # move it to the link register (lr) 0x10000014: addi r1, r1, 8; # increase the stack pointer by 0x08 0x10000018: blr; # branch to link register
The first three instructions are be the ones we are really interested
in. Using these we want to achieve our planed behaviour. This could be for
example loading values into registers (from the stack, which we can control!),
moving values between registers, calling functions or write values to memory
and much more.
The instructions from
0x10000010 read the new return
address from the
stack pointer + 0xC, which is the value we’ve previously
put on the stack (
The instruction at
0x10000014 will increase the stack pointer by 0x08,
0x10000018 will branch to the link register which
was set in the previous instructions.
After executing the gadget this stack will look like this.
# Stack after running the gadget: 0x20000000: 0 <-- 0x20000004: 0x10000000 <-- 0x20000008: 0 <-- Stackpointer (r1) 0x2000000C: [NEW GADGETADDRESS] <-- current gadget address [...] <-- stack data for the gadget in 0x2000000C
And a new new gadget will be executed. This way chaining multiple gadgets is possible to achieve a intended behaviour.
How to find rop gadgets
There are several tools that help you find rop gadgets. I had the best luck
with the tool Ropper. Before you can use
Ropper with Wii U binaries, you need to
convert them to ELF files.
allows you to display and filter all rop gadgets in a binary up to an
Beside the actual binary of the exploited application you can also use rop gadgets of the system libraries (.rpl files). The “core” system libraries are always at the same location in the memory, which make them easily usable for rop gadgets. In fact it’s preferred to use gadgets from these executables to be independent of the application to be exploited.
Here is a list of all system libraries that are at a fixed position on memory and their location (.text section, FW 5.5.x+)
coreinit 101C400 - 1090F00 tve 1090F40 - 10B9BC0 nsysccr 10B9C00 - 10BFD40 nsysnet 10BFD80 - 10CFE60 uvc 10CFEC0 - 10D2120 tcl 10D2180 - 10ED6E0 dc 110D600 - 111FEC0 vpadbase 111FF00 - 1128840 vpad 1128880 - 113D5E0 avm 113D640 - 114EBE0 gx2 114EC40 - 11C3020 snd_core 11C3080 - 11E3820
It’s good idea not to hardcore any of the addresses for rop gadgets, but instead get them from the binaries either via the ELFSymbols or a hash. For improving the browser exploit I built a small Java tool that will return a list of gadgets for a config file. This way the rop gadgets for different versions of the binary can easily be found.
Finding actual useful gadgets
After some research I finally knew enough to find a rop gadget on my own for
the first time. The goal was to perform a stack pivot to be able to switch to a
different (bigger!) stack. As we have learned in previous sections, the stack
pointer in stored in register
r1. To modify the stack pointer, we need to
find a gadget to modify
To achieve this, I search for any gadgets that writes a value into
any results. But I found a gadget that moves the content of
I started searching for gadgets to control
r12, with out any success. But I
found one that moves the content of
r12… and so on. You see
how this is going to end. The ultimate goal was to find a “chain”, that
starts reading a value from the stack and moves it over several gadgets into
r1. In the end I really managed to find a working set of gadgets to perform
a stack pivot. It wasn’t the most gorgeous solution, but it worked. As the
project moved on was I able to improve and shorten the chain multiple times.
Beside having the rop size limitation, there was still the problem on being
the wrong CPU core. To switch the affinity of a thread, it needs to be
suspended. This mean it’s not possible for a thread to move itself to another
CPU core. The obvious solution is create another thread with the affinity to run
on the target core. But there is one problem: The
takes 9 arguments, but with exiting rop gadgets it’s only possible to call a
function with up to 6 arguments.
With motivation from the success of finding a stack pivot gadget, I was trying
to a rop gadget to create a thread. For quite some time I tried to find a
gadget to call an arbitrary function with 9 arguments, but without success.
Then I realized that
OSCreateThread is just a wrapper for an internal
“create thread” function, where the function call is using register
r25 to r31
as arguments instead of
r3-r9. In the PowerPC architecture arguments of a
function are stored before the call in registers
r3 to r9, setting these on
the end of a function is much more unlikely than the upper registers. The
“upper” registers (e.g
r24 - r31) are often saved on the stack at the
beginning of a function, and restored (loaded from the stack) at the end of a
function. The combination of having a
OSCreateThread gadget which loads
r25 to r31 and having an easy gadget to set these registers
make this function call with a huge amount arguments feasible.
How to execute long rop chains
At this point it was possible to do a stack pivot and create another thread on the right core. But there was still the problem of the size limited rop chain. Rambo6Glaz and I tried to figure out a way to allow bigger rop chains and came up with two different ideas:
- Create a rop chain to load a bigger chain via the network
- Split up the “final” rop chain into multiple chunk, run the exploit multiple
times and save each time one chunk inside a
While Rambo6Glaz focused on the network solution, I gave the
OSDriver idea a
Running the exploit multiple times!
The Wii U OS has a feature that allows libraries to install
Beside registering callback on certain event like acquiring or loosing the
OSDrivers can also store data inside the kernel. This is useful
to store permanent data that can be used even after restarting or switching
the application. Using the kernel syscalls
directly let’s us bypass some checks and simplifies the usage.
Here is a general workflow of this idea:
- Run the exploit in Mario Kart 8 to get rop chain execution.
- Build a rop chain that registers a new
OSDriverand stores embedded data (in this case a part of a big rop chain) inside the kernel using
- Open the Home Menu via rop chain and exit the game.
- Go back to step 1 until the whole rop chain is placed in different
- Build another rop chain that takes the data saved in the
OSDriversand execute it on a new thread on core 1 (GX2 main core in Mario Kart 8).
Using this approach I was able to store 816 bytes inside a
OSDriver which each
restart. I improved the rop chain generation to automatically take care of the
generation of all different rop chain that are needed.
It worked quite well. Finally I could build a rop chain without thinking about the size limit. In fact the size of the final rop chain limited by the amount of “read data from OSDriver X” gadgets, but I never reached it (~8000 bytes were possible). The downside: each try took quite long. I had to run the exploit at least three times to get the “final” rop chain running to check if it’s working. This leads in a > 5 minutes test cycle. For testing just some ideas it was enough, but on long term it was really annoying.
Using this I was able to test some ideas that were previously not possible due to size constraints. One of the first things I tried was to shutdown the GX2 engine and restart it again to have it in a clean state for the kernel exploit. This was now possible because we were on the right CPU core. But this resulted in a crash because the actual game was still running and using the GX2 engine. A simple solution was to suspend the main thread (which luckily is on a fixed address which can be easily obtained from the crash logs), and resume it in the end of the rop chain. Without resuming the main thread exiting the game wouldn’t be possible. But even with stopping the main thread and a reinitialization of the GX2 engine the exploit was still not working. Also adding some waiting in form various variations didn’t help.
The best theory at the was that it didn’t work because something in the background was still running and using the GX2 engine, interfering with the exploit. At this point I was really desperate and tried to implement every single implementation in the rop chain, hoping one of it would actually work. But nothing was working.
From working on the plugin system I knew that threads on the CPU core 2 will
actually keep running when opening the
Home Menu. My idea was to to perform
the exploit while the game was suspended in the background, but this also
We need more gadgets!
Each application implements a
ProcUI is a wrapper library
which allows an easier usage of the system message queue from
Cafe OS. The
ProcUI loop is the place in the application where it’s decided if the
application is requested to move to the background, just gained the
foreground or should be closed. I thought by sending a “close application” to
the game and keep our own thread running we would have a chance of running
rop chain in pretty clean environment without the actual game running and
interfering with it.
The easiest way to tell a game that it should be closed is by calling the
SYSRelaunchTitle from the
sysapp library, but actually using it was
way harder than I thought. In this blog post we’ve already talked about the
system libraries that are always at a fixed address in memory, but
is not one of them. The function address can be easily obtained using
OSDynLoad_FindExport. The real problem is using
any of the return values and calling a function not by it’s address but by
a function address pointer.
To accomplish this once again more rop gadgets needed to found. The function
OSDynLoad_FindExport takes the module handled acquired via
OSDynLoad_Acquire as first argument, which dynamically changes after each
restart. So the first needed gadget was function call where the first
argument is dereferenced from an address. In addition a gadget is needed to
call the function pointer that is returned using the
After finding these gadgets it was finally possible to call
to trigger a game shutdown, but it turns out it also kills any other existing
threads. The idea of keeping rop chain execution after shutting down the game
didn’t work either.
But these new gadgets really help to test new things. For example we were
able to test the “magic”
IM_SetDeviceState call which is used in the
browser exploit to shutdown the browser. It turns out that it just emulating
pressing the home button is not helping.
Loading bigger rop chains via the network!
The whole time I was using my slow “run the exploit multiple times to get a bigger rop chain”-approach, while RamboGlaz6 was working on loading a second rop chain over the network.
At some point RamboGlaz6 finally managed to get a stable rop chain execution of a rop chain send via TCP to the console. The workflow was something like that:
- Create a new thread on CPU core 1
- Inside the thread connect to a TCP server and receive a bigger rop chain
- Do a stack pivot to execute the received rop chain
This was really stable and massively sped up the testing of new rop chains.
Just keep GX2 running
Due to the faster testing I tried several new things. One of them was
stop trying to shutdown and restart GX2 but still suspend the main thread of Mario Kart 8. This lead to an exception in the kernel, so something was happening. To perform the kernel exploit we place a fake heap entry and modify the kernel heap to use this. The crash log suggested the kernel was indeed trying read from the right address, but the read data was not the one we placed there. I wasn’t (and I am still not sure) if this was because of some weird caching issue, but I went the safe route and modified the exploit to read the fake heap entry from
0x2F200014 instead of
0x1F200014 and it
worked first try.
I gave it a few more shots and it was indeed stable. Finally.
From now on we had a stable kernel exploit which granted us read/write access with kernel privileges. The JIT-area isn’t just helpful for providing easy userland code execution, but also provides easy kernel execution. It’s also the only region in memory which allows write and execute for the kernel, but we still had no access to this region.
Without kernel execution and the default memory mapping there isn’t really anything special you can do with kernel privileged writes, only modifying the kernel .data section and register a new syscall. Without being able to run custom code a new syscall isn’t that helpful. But kernel write is enough to change the tables inside the kernel which are used for the memory mapping and give us a mapping of a “execute only” region with write privileges. The downside of this is that we need to restart the application before the changes take place. So we still do at least one restart.
Before restarting it’s important revert the changes we did to the kernel heap.
We also register a new syscall 0x25 which points to a memcpy function
0xfff09e44 on 5.5.x) to keep an easy way to perform copy operations with
Userland code execution!
After performing the kernel exploit, setting up the memcpy syscall, mapping
the memory and restarting Mario Kart 8 we perform the exploit once again. Now we
can finally achieve code execution. Using the new memory mapping we can copy our
any executable into the free
0x011DD000...0x011E0000 region. Afterwards we
override the “main()” function call with a jump to our code and switch to the
Mii Maker. This will execution our payload in
Mii Maker context!
But we still have no real control of the kernel without kernel execution.
Unfortunately the free
0x011DD000...0x011E0000 region which we are using
for userland code execution has no kernel execution rights. I spent some time
to think of a solution when I remembered the RPX version of the homebrew
launcher. The RPX version of the homebrew launcher was intended to run as
channel in a environment without kernel access, so it ships with a own kernel
exploit. It also has no access to the JIT-areas, but somehow achieves kernel
execution. Looking at the code reveals that there is a region in memory
0x017FF000, just before the JIT area) that is writable using the memory
mapping and also have kernel execution rights. This is enough to have
arbitrary kernel execution by placing a payload in this area and register it
as a syscall. By changing a IBAT (controls the memory mapping) kernel
execution rights can be provided for any other region in memory.
In previous blog posts I talked about an homebrew environment where all
exploits should be able load a
payload.elf from the sd card and execute it.
To achieve this we need to fulfill the requirements of the payload loader,
and run the payload loader afterwards. One of the requirements it having a
syscall which allows the modification of IBAT0 to gain kernel code execution.
The other ones are just the “default”
After installing these syscalls we just need to load the
into memory and run it.
Based on the JsTypeHax_payload
I created a payload for the Mario Kart 8 exploit which setups the needed
syscalls for the payload loader and copies the loader into memory.
0x011DD000...0x011E0000 region barely enough to fit this “payload
loader installer” and the actual
payload.elf loader, but it somehow fits.
After copying the
payload.elf loader into memory it can be finally executed.
payload.elf will be loaded from the sd card and executed. We
are finally done.
In the end I spent way more time on this than I ever would have thought. So many times I was so close to just give up, but somehow the exploit was really addicting. Once again a big shoutout to Ramboglaz6 (aka NexoCube) who worked on this at the same time. We shared our ideas and tried to motivate each other. In the end we both came up with a working solution which is quite nice.
This blog post may not be most technical one, and maybe not the most exciting one, but this is how developing such a exploit really is, at least in my experience. 95% of the time you’re just failing and trying different ideas. Several times you will be stuck, but somehow there is always a solution. On one side it feels like I’ve wasted way too much time on this, but on the other side I also learned so much. And it feels nice to actually finish such a demotivating project. Even if no one will ever actually use it.
How can I find the code
I put all of the code on Github: