Rating: 5.0
[Original writeup](https://ohaithe.re/post/624142953693249536/uiuctf-2020-cricket32)
This writeup will be an example in using [angr](https://angr.io/) to “automatically” solve this problem. But along the way, we’ll find two deficiencies in angr, and show how to fix them! Beyond being a writeup of this problem -- there is very little information specific to this challenge -- this can serve as an angr use and development tutorial.
## Challenge:
We are given a file `cricket32.S`, which contains 32-bit x86 assembly and instructions for assembling:
```language-nasm
// gcc -m32 cricket32.S -o cricket32
.text
str_usage: .string "Usage: ./cricket32 flag\nFlag is ascii with form uiuctf{...}\n"
str_yes: .string "Flag is correct! Good job!\n"
str_nope: .string "Flag is not correct.\n"
.global main
main:
mov $str_usage, %ebx
xor %esi, %esi
SNIP
jmp printf
```
Note that the `-m32` flag will force `gcc` to compile a 32-bit executable. The assembly is pretty short, but has a few hard-to-interpret instructions, such as `aaa` (ASCII Adjust After Addition), `sahf` (Store AH into Flags), and so on. It's possible these are filler meant to distract; it's also possible that they're being used as part of some bizarre check, or as _data_ (if the program is reading its own code).
## Approach
Instead of trying to reverse engineer this ourselves, we'll be using [angr](https://angr.io/) to solve for the input. After compiling it, we open it up in IDA to see what addresses we might care about:
We see that if the code reaches `0x12BD`, it will load `str_nope`, and then proceed straight to `loc_12BC` and `printf`. This is our failure condition. We'll load the binary in angr and tell to find an input that allows us to avoid that address, `0x12BD`.
```language-python
import angr
import claripy
project = angr.Project("./cricket32", auto_load_libs=True)
flag_len = 32
arg1 = claripy.BVS('arg1', flag_len*8)
initial_state = project.factory.entry_state(args=["./cricket32", arg1])
```
Here `arg1` is our symbolic representation of the argument to the binary. We've allocated 32 bytes to (`flag_len`), hoping this is enough. Since arg1 can have zero-bytes, effectively being shorter than 32, this will work as long as the flag is *at most* 32 bytes long.
Running the above code produces the output:
```
WARNING | 2020-07-19 20:08:15,647 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
```
so the address that we wanted to avoid, `0x12bd`, will actually be `0x4012bd`. Let's now `explore` for that address:
```language-python
sm = project.factory.simulation_manager(initial_state)
sm.explore(avoid=[0x40128D])
```
Unfortunately, we are left with the result `<SimulationManager with 26 avoid (7 errored)>`. When angr explores paths, it has a few different "stashes" that paths can be in. There are:
* `active`: paths that have not been fully explored yet
* `deadended`: which is paths that terminated the program
* `avoid`: paths that reached somewhere we didn't want it to
* `found`: paths that found an explicit destination request (we gave no such destination here)
* `errored`: indicating that angr was unable to continue with a path.
Ideally we would have had a `deadended` path here, meaning that a program successfully terminated without reaching the `avoid`ed point of `0x4012bd`. Since we didn't, let's take a look at the errors.
`sm.errored[0]` will get the first state in the `errored` stash, and on the console we get:
```
<State errored with "IR decoding error at 0x4012a2. You can hook this instruction with a python replacement using project.hook(0x4012a2, your_function, length=length_of_instruction).">
```
What's `0x4012a2`? Looking in IDA again, we see the `crc32 edx, dword ptr [esi]` instruction. The `crc32` instruction is used to compute a [cyclic redundancy check](https://en.wikipedia.org/wiki/Cyclic_redundancy_check). This is a somewhat complicated operation, and generally rare, so it's not terribly surprising that `angr` doesn't know how to translate it into constraints. As suggested, we can implement a hook to emulate this.
## Angr Hooks
A hook in angr is used to modify or replace a piece of code. A `syscall` instruction may, for instance, invoke some complicated kernel behavior, and hooking it could allow us to substitute our own behavior; or a simple `call` might be hooked to modify the execution. When writing a hook, it is important to remember we are working with *symbolic* values, not actual integers, so that we ultimately return a mathematical expression representing the modified state.
Our hook will intercept the `crc32` instruction, emulate it, and then return control flow at the end of the instruction. The `crc32` instruction is 5 bytes long, so our hook starts:
```language-python
def crc32_hook(state):
pass
project.hook(0x4012a2, crc32_hook, length=5)
```
Next we need to implement the correct behavior. Since all computations must be symbolic, we can't just use a library implementation of CRC32. Additionally, most CRC32 implementation use the "CRC-32" specification, with the 0x04C11DB7 polynomial (essentially a magic number). The x86 crc32 instruction instead uses the "CRC-32C", or "Castagnoli", specification, with the 0x1EDC6F41 polynomial. Some googling turns up [this implementation of CRC-32C](https://chromium.googlesource.com/external/googleappengine/python/+/refs/heads/master/google/appengine/api/files/crc32c.py) we can adapt.
```language-python
CRC_TABLE = (
0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4,
# ... SNIP ...
0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351,
)
table_dict = {i: claripy.ast.bv.BVV(CRC_TABLE[i],32) for i in range(256)}
def get_crc32_table_BVV(i):
i = i.to_claripy()
return claripy.ast.bool.ite_dict(i, table_dict, 0xFFFFFFFF)
def crc32c(dst,src):
b32 = src
crc = dst
for i in [3,2,1,0]:
b = b32.get_byte(i)
shift = (crc >> 8) & 0x00FFFFFF
onebyte = crc.get_byte(3)
crc = get_crc32_table_BVV(onebyte ^ b) ^ shift
return crc
```
Here we need to do a symbolic table lookup, which we can implement with [`ite_dict`](https://github.com/angr/claripy/blob/f2c1998731efca4838a4edb9dec77e0424c5f691/claripy/ast/bool.py#L164). `ite_dict(i, table_dict, 0xFFFFFFFF)` means that we use the key-value pairs in `table_dict` to look up `i`; the default value will be 0xFFFFFFFF is not in the dict (but in our case it always will be). Then our `crc32c` method computes the update to running CRC32 `dst` using the four bytes in `src`. Our hook is then:
```language-python
def crc32_hook(state):
crc = state.regs.edx
addr = state.regs.esi
b32 = state.memory.load(addr).reversed
print('CRC32 accessing ',b32)
state.regs.edx = crc32c(crc, b32)
```
This clearly not a drop-in replacement for any `crc32` instruction: we've hard-coded that the source operand is `dword ptr [esi]` and our destination is `edx`. But that will work for our purposes here. After adding our hook, we reload the initial state and `explore` as before. The search continues for a few seconds, until we get a *very* long stack trace that eventually ends:
```
RecursionError: maximum recursion depth exceeded while calling a Python object
```
Urk.
## Removing recursion
This turns to come from a shortcoming of how [`ite_dict`](https://github.com/angr/claripy/blob/f2c1998731efca4838a4edb9dec77e0424c5f691/claripy/ast/bool.py#L164) is implemented. It passes the dictionary to `ite_cases`, which loops a `claripy.If` expression -- essentially a ternary operator. In our case, this produces an expression like
```language-python
claripy.If(i == 0,
0x00000000,
claripy.If(i == 1,
0xf26b8303,
claripy.If(i == 2,
0xe13b70f7,
claripy.If(i == 3,
0x1350f3f4,
... <255 levels like this>
))))
```
and when Z3 (the backend constraint solver) tries to analyze this, it balks at the depth of the expression. There are two options now: (1) implement a better version of `ite_dict`, or (2) use a table-free imeplemtation of CRC-32C. In the competition, we used option (2). Googling some more for how to generate the table leads us to [this page](https://medium.com/@chenfelix/crc32c-algorithm-79e0a7e33f61) on generating the table. Instead of generating a table and storing it, we can just "recompute" (symbolically) the value each our `crc32` instruction is hit. This leads to the code,
```language-python
def get_crc32_calc_BVV(i):
crc = i.zero_extend(32 - i.size());
if isinstance(crc, angr.state_plugins.sim_action_object.SimActionObject):
crc = crc.to_claripy()
for j in range(8):
shift = ((crc >> 1) & 0x7FFFFFFF)
cond = crc & 1 > 0
crc = claripy.If(cond, shift ^ 0x82f63b78, shift);
return crc
```
The `crc.to_claripy()` here is necessary in case angr passes us a `SimActionObject` instead of an actual symbolic value. Then the rest of the operations work just like the C code in the link above, with the `claripy.If` replacing C's ternary operator. Then we replace the appropriate line in our `crc32c` function to use `get_crc32_calc_BVV` instead of `get_crc32_table_BVV`. Looking at our code so far:
```language-python
import angr
import claripy
#copy implementation from https://medium.com/@chenfelix/crc32c-algorithm-79e0a7e33f61
def get_crc32_calc_BVV(i):
crc = i.zero_extend(32 - i.size());
if isinstance(crc, angr.state_plugins.sim_action_object.SimActionObject):
crc = crc.to_claripy()
for j in range(8):
shift = ((crc >> 1) & 0x7FFFFFFF)
cond = crc & 1 > 0
#print(crc & 1 > 0)
crc = claripy.If(cond, shift ^ 0x82f63b78, shift);
return crc
def crc32c(dst,src):
b32 = src
crc = dst
for i in [3,2,1,0]:
b = b32.get_byte(i)
shift = (crc >> 8) & 0x00FFFFFF
onebyte = crc.get_byte(3)
crc = get_crc32_calc_BVV(onebyte ^ b) ^ shift
return crc
def crc32_hook(state):
crc = state.regs.edx
addr = state.regs.esi
b32 = state.memory.load(addr).reversed
print('CRC32 accessing ',b32)
state.regs.edx = crc32c(crc, b32)
project = angr.Project("./cricket32", auto_load_libs=True)
flag_len = 32
arg1 = claripy.BVS('arg1', flag_len*8)
initial_state = project.factory.entry_state(args=["./cricket32", arg1])
sm = project.factory.simulation_manager(initial_state)
project.hook(0x4012a2, crc32_hook, length=5)
sm.explore(avoid=[0x40128D])
```
At the end we are left with: `<SimulationManager with 6 deadended, 33 avoid>`. Fantastic! That means 6 paths successfully terminated without hitting `0x4012bd`. Each of these `SimState`s are listed in `sm.deadended`. To get the input, we can call `state.make_concrete_int(arg1)`, which will return `arg1` as a big integer; then `.to_bytes(32,"big")` to turn it into a string:
```language-python
>>> for state in sm.deadended:
... state.make_concrete_int(arg1).to_bytes(32,"big")
...
b'uiuc\xdbK\xdf\x9d\xf0N\xd6\x95cket_a_c\xddL\xc7\x97it}\x00\x00\x00\x00\x00'
b'\xdaD\xd1\x9ftf{a_cricket_a_c\xddL\xc7\x97\xc6Y\xd9\xfc\x00\x00\x00\x00'
b'uiuctf{a\xf0N\xd6\x95\xccF\xc1\x88_a_crack\xc6Y\xd9\xfc\x02\x00\x00\x00'
b'\xdaD\xd1\x9f\xdbK\xdf\x9d\xf0N\xd6\x95cket_a_c\xddL\xc7\x97\xc6Y\xd9\xfc\x10\x10\x00\x00'
b'\xdaD\xd1\x9ftf{a_cri\xccF\xc1\x88\xf0L\xfb\x9frack\xc6Y\xd9\xfc \x01\x00'
b'uiuctf{a\xf0N\xd6\x95\xccF\xc1\x88_a_c\xddL\xc7\x97\xc6Y\xd9\xfc\x01\x04\x01\x08'
```
We see that angr has found 6 solutions that *technically* would work as input to the program, and indeed several start with `uiuctf{` -- a sign we're on the right track! But they're filled with non-printable characters. The astute eye might piece together the flag from the above malformed fragments, but the "right" thing to do is to do tell angr that each byte of input will be printable ASCII (or zero).
## Cleanup
```language-python
for b in arg1.chop(8):
initial_state.add_constraints((b == 0) | ((b > 31) & (b < 127)))
for i in range(len("uiuctf{")):
b = arg1.chop(8)[i]
initial_state.add_constraints(b == ord("uiuctf{"[i]))
```
It's important to note that -- perhaps a bit surprisingly -- compound constraints are formed using `|` and `&`, not Python's `or` and `and` keywords. `arg1.chop(8)` breaks the 256-bit vector `arg1` into a list of 8-bit bytes, and we add a constraint for each byte. The second loop actually forces that the flag starts with `uiuctf{`. Probably not strictly necessary, but will definitely accelerate solving. This code gets inserted right before `initial_state = project.factory...`. The evaluation now takes less than a minute, and ends with
```language-python
<SimulationManager with 1 deadended, 26 avoid>
>>>
>>> for sol in sm.deadended:
... print(sol.posix.dumps(1))
... sol.make_concrete_int(arg1).to_bytes(32,"big")
...
b'Flag is correct! Good job!\n'
b'uiuctf{a_cricket_a_crackit}\x00 @@\x00'
```
Flag get!
## A reality check
In practice, progress wasn't quite this linear. Some gotchas I encountered were:
* Needing to use `.to_claripy()` in the hook -- if you don't, the hook runs fine, and then you get weird and inscrutable errors later.
* Figuring out what the recursion error was coming from. Was determined by shrinking the dictionary size and watching the problem go away (although of course no solution was found).
* Lots of byte-ordering issues. angr will often store the bytes the opposite order you expect, so e.g. forcing the string to start with `uiuctf{` was something initally done backwards.
* LOTS more byte-ordering issues in implementing the CRC. And sign-bit madness. If you think translating bit-fiddly C code to Python is bad, try translating bit-fiddly C code to symbolic Python with unusual endianness.
## Postscript: A better `ite_dict`
During the competition, I use a loop implementation of CRC-32 that didn't use a table. In practice, there's little reason why a table couldn't be used, if the if-then-else statement went in a binary tree instead. So I wanted to try that too! Code:
```language-python
#Improved version of ite_dict that uses a binary search tree instead of a "linear" search tree.
#This improves Z3 search capability (eliminating branches) and decreases recursion depth:
#linear search trees make Z3 error out on tables larger than a couple hundred elements.)
# Compare with https://github.com/angr/claripy/blob/f2c1998731efca4838a4edb9dec77e0424c5f691/claripy/ast/bool.py#L164
def ite_dict(i, d, default):
i = i.ast if type(i) is claripy.ast.base.ASTCacheKey else i
#for small dicts fall back to the old implementation
if len(d) < 4:
return claripy.ast.bool.ite_cases([ (i == c, v) for c,v in d.items() ], default)
#otherwise, binary search.
#Find the median:
keys = list(d.keys())
keys.sort()
split_val = keys[len(keys)//2]
#split the dictionary
dictLow = {c:v for c,v in d.items() if c <= split_val}
dictHigh = {c:v for c,v in d.items() if c > split_val}
valLow = ite_dict(i, dictLow, default)
valHigh = ite_dict(i, dictHigh, default)
return claripy.If(i <= split_val, valLow, valHigh)
def get_crc32_table_BVV(i):
i = i.to_claripy()
return ite_dict(i, table_dict, 0xFFFFFFFF)
```
With this modified version of `get_crc32_table_BVV`, the table-based code ran fine. Interestingly, it took several minutes to solve -- quite a bit slower than the "slow" and "complicated" loop-based implementation. In general though, I expect the binary tree would be faster for Z3 to analyze than a linear expression. I've [created a pull request](https://github.com/angr/claripy/pull/181) to get this improvement into angr, hopefully. :)