Tags: risc-v math
Rating: 5.0
We are greeted by a Linux binary... except it's RISC-V, and not our beloved x86! But that's OK, we can still run it with QEMU:
```
~/sec/pwn2win/tooslow $ file too\ slow
too slow: ELF 64-bit LSB executable, UCB RISC-V, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 3.0.0, with debug_info, not stripped
~/sec/pwn2win/tooslow $ sudo pacman -S qemu-arch-extra
~/sec/pwn2win/tooslow $ qemu-riscv64 ./too\ slow
/lib/ld-linux-riscv64-lp64d.so.1: No such file or directory
```
Of course, dynamic linking strikes again. In the end, I downloaded libc, libgcc and libstdc++ from [Debian's ftp servers](http://ftp.ports.debian.org/debian-ports/), and unpacked the packages into a directory, with the goal of using QEMU's `-L`:
```
~/sec/pwn2win/tooslow $ mkdir sysroot
~/sec/pwn2win/tooslow $ ar x libc6_2.27-5_riscv64.deb
~/sec/pwn2win/tooslow $ cd sysroot
~/sec/pwn2win/tooslow/sysroot $ tar xf ../data.tar.xz
~/sec/pwn2win/tooslow/sysroot $ cd ..
~/sec/pwn2win/tooslow $ ar x libgcc1_8.2.0-10_riscv64.deb
~/sec/pwn2win/tooslow $ cd sysroot
~/sec/pwn2win/tooslow/sysroot $ tar xf ../data.tar.xz
~/sec/pwn2win/tooslow/sysroot $ cd ..
~/sec/pwn2win/tooslow $ ar x libstdc++6_8.2.0-10_riscv64.deb
~/sec/pwn2win/tooslow $ cd sysroot
~/sec/pwn2win/tooslow/sysroot $ tar xf ../data.tar.xz
~/sec/pwn2win/tooslow/sysroot $ cd ..
~/sec/pwn2win/tooslow $ qemu-riscv64 -L sysroot ./too\ slow
./too slow: /lib/riscv64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by ./too slow)
```
Of course, symbol versioning amplifies the dependency hell. Nothing a hex editor can't fix, though. Let's look what symbol versions we actually have...
```
~/sec/pwn2win/tooslow $ objdump -p sysroot/lib/riscv64-linux-gnu/libc.so.6
[ . . . ]
Version definitions:
1 0x01 0x0865f4e6 libc.so.6
2 0x00 0x06969187 GLIBC_2.27
3 0x00 0x0963cf85 GLIBC_PRIVATE
GLIBC_2.27
Version References:
required from ld-linux-riscv64-lp64d.so.1:
0x06969187 0x00 05 GLIBC_2.27
0x0963cf85 0x00 04 GLIBC_PRIVATE
~/sec/pwn2win/tooslow $ sed 's@GLIBC_2.25@GLIBC_2.27@g' -i.bak too\ slow
~/sec/pwn2win/tooslow $ qemu-riscv64 -L sysroot ./too\ slow
./too slow: /lib/riscv64-linux-gnu/libc.so.6: version `GLIBC_2.27' not found (required by ./too slow)
```
Well, that didn't work. However, if you look closely, you can see that objdump also prints what I presume to be the hashes used for quick lookup. Fortunately, a good guess lets you locate the value quickly:
```
~/sec/pwn2win/tooslow $ objdump -p too\ slow
[ . . . ]
Version References:
required from libc.so.6:
0x06969185 0x00 04 GLIBC_2.27
required from libstdc++.so.6:
0x08922974 0x00 03 GLIBCXX_3.4
0x0297f871 0x00 02 GLIBCXX_3.4.21
~/sec/pwn2win/tooslow $ xxd too\ slow | grep '8591 9606'
00000590: 8591 9606 0000 0400 9101 0000 0000 0000 ................
```
Patching the byte at 0x590 to say 0x87 instead of 0x85 lets you run the binary, only to find out that while it runs, it doesn't seem to be close to finishing. Normally at this point I'd use `ltrace`, but the indirection of QEMU makes that difficult. Fortunately, we can use the integrated system call tracer:
```
~/sec/pwn2win/tooslow $ qemu-riscv64 -strace -L sysroot ./too\ slow
21458 munmap(0x000000400081b000,238347) = 0
21458 brk(NULL) = 0x0000000000013000
21458 brk(0x0000000000034000) = 0x0000000000034000
21458 nanosleep(274886295048,274886295048,274886295496,-4208914206758899968,274889918272,0) = 0
21458 nanosleep(274886294984,274886294984,274886295496,-4208914206758899968,274889918272,0)^C = -1 errno=4 (Interrupted system call)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_pid=0, si_uid=0} ---
```
It seems that someone used a generous amount of race condition mitigations. Let's stub-out the PLT for sleep:
```
~/sec/pwn2win/tooslow $ r2 ./too\ slow
Unknown DW_FORM 0x06
-- WASTED
[0x00010850]> aaaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze value pointers (aav)
[x] Value from 0x00010000 to 0x00010b28 (aav)
[x] 0x00010000-0x00010b28 in 0x10000-0x10b28 (aav)
[x] 0x00010000-0x00010b28 in 0x11de0-0x120a0 (aav)
[x] Value from 0x00011de0 to 0x000120a0 (aav)
[x] 0x00011de0-0x000120a0 in 0x10000-0x10b28 (aav)
[x] 0x00011de0-0x000120a0 in 0x11de0-0x120a0 (aav)
[x] Emulate code to find computed references (aae)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Constructing a function name for fcn.* and sym.func.* functions (aan)
[x] Enable constraint types analysis for variables
[0x00010850]> s sym.imp.sleep
[0x00010740]> pdf
/ (fcn) sym.imp.sleep 160
| sym.imp.sleep (int s);
| ; CALL XREFS from entry4.fini (0x10948, 0x10950)
| 0x00010740 172e0000 auipc t3, 0x2
| 0x00010744 033e8e90 ld t3, -1784(t3)
| 0x00010748 67030e00 jalr t1, t3
| 0x0001074c 13000000 nop
| ;-- imp.__libc_start_main:
[ . . . ]
| 0x000107c0 0145 li a0, 0
| 0x000107c2 0561 addi sp, sp, 32
| 0x000107c4 8280 ret
[ . . . ]
~/sec/pwn2win/tooslow $ checksec too\ slow
[ . . . ]
PIE: No PIE (0x10000)
RWX: Has RWX segments
~/sec/pwn2win/tooslow $ xxd -s 0x740 -l 32 too\ slow
00000740: 172e 0000 033e 8e90 6703 0e00 1300 0000 .....>..g.......
00000750: 172e 0000 033e 0e90 6703 0e00 1300 0000 .....>..g.......
```
We can see that the binary is loaded at 0x10000, and that `sleep()` is located 0x740 bytes after that. A quick sanity check confirms the endianness of the instructions, making it easy to patch in a `ret` while looking at the encoding from another function.
Unfortunately, running the binary now does not help with the execution time. I guessed that if I was ever going to get this to finish, I'd have to use a faster algorithm to compute whatever this executable tries to. That's why I reproduced its behavior in Python by studying the compiled code. I'll spare you the details, since no obfuscation techniques were used.
```
def N(prev, new):
for _ in range(prev ** 7 - 2):
prev, new = new, (prev + new) % 1000000007
return (new % 89 + 37) % 256
# extracted from the binary
flag = [
0x20, 0x35, 0x53, 0x61, 0x33, 0x36, 0x31, 0x54,
0x74, 0x6b, 0x56, 0x75, 0x70, 0x2d, 0x60, 0x23,
0x7a, 0x6c, 0x57, 0x4a, 0x51, 0x7a, 0x5b, 0x34,
0x5e, 0x71, 0x7d, 0x5b, 0x63, 0x3a, 0x27, 0x35,
0x70, 0x2a, 0x29, 0x6b, 0x60, 0x26, 0x59, 0x42,
0x2c, 0x74, 0x6b, 0x56, 0x75, 0x70, 0x5e, 0x71,
0x6c, 0x37, 0x57, 0x25, 0x43, 0x2b, 0x4b, 0x66,
0x77, 0x23, 0x21, 0x5b, 0x59, 0x49, 0x62, 0x45,
0x2c, 0x2a, 0x20, 0x24,
]
for i in range(1, len(flag)):
prev = flag[i - 1]
new = flag[i]
print(chr(N(prev, new)), end='')
```
As you can see, a Fibonacci-esque sequence with custom starting terms is computed modulo 1000000007. This number was probably chosen because it's a prime near a round number, and it being composite could make it easier to solve.
After consulting stack overflow, I knew I had to use matrix exponentiation: let's represent the two terms in memory as a vector. Notice that we can express an iteration of the loop as multiplying a matrix by the vector:
$$
\begin{bmatrix}
0 & 1\\\\
1 & 1
\end{bmatrix}
\begin{bmatrix}
a\\\\b
\end{bmatrix}
=
\begin{bmatrix}
b\\\\a + b
\end{bmatrix}
$$
Of course, to compute further terms, we can multiply by the matrix multiple times, or, since matrix multiplication is commutative, multiply by the matrix raised to some power. The same property makes it possible to calculate the $n$-th power of a matrix, and thus the $n$-th term of a Fibonacci-esque sequence in $O(\log n)$ multiplications, thanks to the [exponentiation by squaring algorithm](https://en.wikipedia.org/wiki/Exponentiation_by_squaring). Additionally, because all matrices I were dealing with were of the form
$$
\begin{bmatrix}
x & y \\\\
y & x + y
\end{bmatrix}
$$
I stored just the top row, since I've found it easier to program for:
```
# Fibonacci matrices only
# [ a0 a1 ][ b0 b1 ] = [ a0b0+a1b1, a0b1 + a1(b0 + b1) ]
# [ a1 a0+a1 ][ b1 b0+b1 ] = [ implied ]
def matmult(a0, a1, b0, b1):
return (a0 * b0 + a1 * b1) % 1000000007, (a0 * b1 + a1 * (b0 + b1)) % 1000000007
# [ a0 a1 ] [ A ] = [ a0*A + a1*B ]
# [ a1 a0+a1 ] [ B ] = [ a1*A + (a0+a1)*B ]
def N(prev, new):
target = prev ** 7 - 2
a0, a1 = 1, 0
for c in bin(target)[2:]:
a0, a1 = matmult(a0, a1, a0, a1)
if c == '1':
a0, a1 = matmult(a0, a1, 0, 1)
return ((a1*prev + (a0+a1)*new) % 1000000007 % 89 + 37) % 256
```
Replacing the original routine with this optimized version yields the flag after milliseconds of computation:
```
CTF-BR{L1n34R_a163BrA_iS_s0_useful1_f0R_L1n34r_r3cuRs1v3_sequ3ncEs}
```