Tags: x87 floating-point vm 

Rating:

# fvm (rev)
Writeup by: [xlr8or](https://ctftime.org/team/235001)

As part of this challenge we get a single x86 ELF binary.
This is a VM challenge, with the VM using x87 to perform its operations. Also, the VM seems to be implemented in c++.

Inspecting the main function, it is pretty normal:
```c
// ...
std::basic_ostream<>::write((char *)(local_1c8 + 2),(long)&code);
/* try { // try from 0010150d to 00101511 has its CatchHandler @ 00101546 */
fvm::emulate((fvm *)local_1c8);
std::__cxx11::basic_stringstream<>::~basic_stringstream((basic_stringstream<> *)local_1c8);
// ...
```

This is the most important part of `main` where the VM is started. Before this code is just some usual c++ stream setup code, not too nice, but not too bad either.
We see that `code` contains the bytecode that the virtual machine is going to execute.
Further it is worth mentioning, that this bytecode is loaded into a `basic_stringstream`, so accessing reading and writing to it is done a bit differently, than for example directly interacting with a byte array.

At the start of the `emulate` function we see the following:
```c
std::basic_istream<>::read((char *)this,(long)&local_5c);
if (((byte)this[0xa0] & 2) != 0) {
pcVar1 = (code *)swi(3);
(*pcVar1)();
return;
}
lVar3 = in_ST0;
lVar2 = in_ST1;
lVar4 = in_ST2;
lVar5 = in_ST3;
lVar6 = in_ST4;
lVar7 = in_ST5;
lVar8 = in_ST6;
```

This is where we read the current instruction into `local_5c`.
Further you can see that based on `this[0xa0]` debugging mode can be enabled, which would cause an attached debugger to break (as in pause execution) there.
Next we see how ghidra shows us operations in the x87, where we have the `ST(i)` registers that form a stack.
`lVarX` is not an actual variable here, but rather ghidra will use these to denote operations on the x87 stack, like pushing and popping, or exchanging items.
You can even see that the disassembly doesn't show any moving instruction from the `ST(i)` registers.

A bit further down below lies the switch statement on the opcode.
I won't copy the whole thing, but I'll explain some operations.

Operation `!`:
```c
in_ST0 = (longdouble)0;
in_ST1 = lVar3;
in_ST2 = lVar2;
in_ST3 = lVar4;
in_ST4 = lVar5;
in_ST7 = lVar8;
```
This is what we see in the decompiled view, however remember that `lVarX` is fake, and used by ghidra to denote what happens on the x87 stack.
In reality all this is generated by a single instruction: `FLDZ`.
This instruction pushes `0` on the x87 stack.
This is why you see `ST(0)` becomes 0, and below the stack is shifted (i.e. `lVar3` is the original value of `ST0`, now we put it in `ST1`).
It is important to notice here, that the x87 stack is not *"infinite"*, there are only the 8 x87 registers that constitute this stack.

There are a couple more operations that put constants on the stack, or do certain arithmetic operations, I won't go through all of them here.
I do encourage you to look at the disassembly here as it is sometimes much simpler than the decompile view.
To understand what an instruction does you can use [this amazing list](https://www.felixcloutier.com/x86/) of x86 instructions (it also contains the x87 instructions).

Operation `a`:
```c
local_58 = (long)ROUND(lVar3);
lVar3 = in_ST7;
std::basic_istream<>::seekg((long)this,(_Ios_Seekdir)local_58);
```

These operations will cause the `stringstream` to seek to the value stored on top of the stack.
This essentially performs a `jump`, since the current location in the stream is the instruction we will decode and execute.
Also worth noting, that there are different types of seek, controlled by the 3rd argument not visible in the decompile view, however checking the disassembly we see `EDX` is zeroed out.

Operation `b`:
There's also the possibility to not only set, but to save the current *instruction pointer*
```c
local_58 = std::basic_istream<>::tellg();
in_ST0 = (longdouble)local_58;
```

Here `tellg` is used to get the current location in the stream, and it is placed on the stack.

Finally let's look at operation `g`, an example of a conditional jump:
```c
in_ST0 = lVar2;
in_ST1 = lVar4;
in_ST2 = lVar5;
in_ST3 = lVar6;
in_ST4 = lVar7;
in_ST5 = lVar8;
lVar7 = lVar8;
in_ST6 = in_ST7;
lVar8 = in_ST7;
if (lVar3 <= lVar2) goto LAB_00101700;
break;
```

I need to give some credit to the decompiler here, for recognizing that floating point comparison is done here. (in the disassembly these take multiple instructions, not just `cmp; j<cond>`)
Here if `st0 <= st1`, then we take the `goto`, meaning we decode the next instruction, the jump **is not** taken.
In the opposite case, when `st0 > st1`, we break out of the switch statement:
```c
std::basic_istream<>::seekg((long)this,(int)local_5a);
in_ST5 = lVar7;
in_ST6 = lVar8;
```
In this case we further advance the stream by an immediate, that is a short spanning the next 2 bytes after the current opcode.

Perhaps one last operation, before we move on? Let's see about operation `9`:
```c
std::basic_istream<>::read((char *)this,(long)&local_4a);
in_ST0 = local_4a;
in_ST1 = lVar3;
in_ST2 = lVar2;
in_ST3 = lVar4;
in_ST4 = lVar5;
in_ST5 = lVar6;
in_ST6 = lVar7;
in_ST7 = lVar8;
```
This operation will read the next 10 (3rd argument, visible only in disassembly) bytes after the instruction and put it on the stack.
The reason I mention this is that x87 is special in the way that the registers can hold 80-bit values (10 bytes).

Now that we know more about x87 and the VM, we have 2 options:
1. Dynamic analysis - we will execute the binary, put breakpoints, see what operations are done
2. Static analysis - we will disassemble (decompile?) the custom bytecode and analyse it further.

Now I went with option 2, since I prefer having an overview over the whole system, rather before going in with dynamic analysis.
Therefore I have written a simple disassembler for the bytecode.

Most of it is just writing nice text prompts for some instructions, the only special operations are jumps.
Whenever a jump of any sort is made the disassembler will calculate the jump target instead of printing just the offset.

You can see the full disassembler in `solve.py` (also contains the bytecode exported from the binary).

Now that I had the disassembler I went in together with GDB to analyze what is going to happen.
The full disassembly (with some comments I manually added through dynamic analysis) can be seen in `disas.txt`

The general operation is as follows:
1. `FLAG: ` is printed to the user
2. we read 2 bytes of the user input, perform `CALC1`, and leave the result on the stack
3. we read the next 2 bytes of the user input, perform `CALC2`, and leave the result on the stack
4. Put the sum, and the product of the previous 2 results on the stack
5. Compare the sum and the product against hardcoded values (loaded with opcode `9`) - keep some information on the stack about success or failure
6. Repeat from #2 seven more times
7. Read and check `}` separately, then check the overall success information updated in #5
8. Print a message indicating overall success.

Therefore our goal is starting to clear up again:
Generate 4-byte block of the flag, such that the sum and product of `CALC1` and `CALC2` matches the hardcoded values.

`CALC1` can be seen from 475-499 (`disas.txt`). It operates on the 2 bytes that were read from the user (and validated to be in the printable ASCII range - offset 537 is where the validation is done)
I went through the disassembly and deduced, that the following result is left on the stack:
```
# x = (2 * pi * (user byte2)) / 256
RESULT = (sin(x) - x) * (user byte1)
```

`CALC2` can be seen from 508-536 (`disas.txt`). In spirit it is similar to the other calculation, the only difference is in the result it leaves on the stack:
```
# y = (2 * pi * (user byte2)) / 256
RESULT = (1 + cos(y)) * sin(y) * (user byte1)
```

The rest of the code just invokes these functions, loads the hardcoded values and does the comparison to the calculated results.

We can implement a 4-byte at a time bruteforce, however we still need the hardcoded values.
To acquire them I set a breakpoint on opcode `9`, after `ST(0)` has the loaded value, and saved them in a separate file.

`sat.py` contains the flag bruteforce code and the hardcoded check values. It starts recovery after the first 8 bytes, which are known, because of the flag prefix.

Original writeup (https://github.com/sbencoding/zer0pts_ctf_2023_writeups/tree/main/rev/fvm).