CTFtime.org / Google Capture The Flag 2019 (Quals) / Code Golf / Writeup

# Code Golf
## Google CTF 2019

## Index
* [Acknowledgements](#Acknowledgements)
* [The problem](#The-problem)
* [Problem statement](#Problem-statement)
* [Some background](#Some-background)
* [Lessons](#Lessons)
* [The solution](#The-solution)
* [A first pass](#A-first-pass)
* [The first correct code](#The-first-correct-code)
* [More compact code](#More-compact-code)
* [The first accepted code](#The-first-accepted-code)
* [Golfing transformations](#Golf-you-a-Haskell)
* [NP-Completeness](#NP-Completeness)

## Acknowledgements

This writeup is done by Cole Kurashige.

I'd like to thank Kye Shi for his help designing a faster algorithm,
and Giselle Serate for algorithm verification and actually running the code. Also for
showing me the CTF.

# The problem

As a foreward/warning, this is a lengthy writeup. If you want the TL;DR highlights ,
first read the [problem statement](#Problem-statement) if you aren't familiar with the problem.
I think the most interesting highlights are the
[golfing tips](#Golf-you-a-Haskell), specifically [this one](#Finding-the-possible-shifts), and
the [NP-Completeness proof](#NP-Completeness).

Or just skim the titles and skim/read what is interesting. A lot of the length comes from headers,
newlines, and code blocks - as far as text goes I've tried to edit things so they're to-the-point.

## Problem statement
The problem could be found [here](https://capturetheflag.withgoogle.com/#challenges/)
as of 6/27/19. In case it gets moved or taken down, I've described it below.

Given a list of strings with gaps in them (denoted by a space), you're asked to combine them into
a single string. Imagine all of the strings stacked on top of each other. You want to shift the strings
to the so that only one or zero characters are in each position. You also want to minimize the
length of the resulting string (ignoring trailing and leading gaps). The tie-breaker for multiple
solutions is lexicographic length. Your solution is a function `g :: [String] -> String`.

An example given is the strings `"ha m"` and `"ck e"`:

If you overlay them:

"ha m"
"ck e"

Shift them by an appropriate offset:

"ha m"
"ck e"

You get the resulting string `"hackme"`.

The catch to all of this is that it must be done in Haskell in 181 bytes or fewer.

Oh, also, it is [NP-Complete](#NP-Completeness).

## Some background

I spent a _long_ time on this problem. Its theme for me was
"comfort's a killer." I really like Haskell, and have been using it recently.
I really like code golf, and have golfed code in the past.
I probably should've spent more time thinking through an algorithm, but I dove in
headfirst. I knew there was a tips for golfing in Haskell on the code golf stack exchange, but I neglected to use it.

And so, I spent an entire weekend on one problem. Welcome to my hell.

## Lessons

I learned three important lessons from this endeavor.

### Lesson #1

Code golf skills don't just magically manifest themselves in another language.

I primarily code golf in [J](https://www.jsoftware.com/indexno.html) and
[><>](https://esolangs.org/wiki/Fish). Haskell is neither of these. I used pretty
much none of my existing code golf knowledge in golfing this challenge.

### Lesson #2

Imports are useful.

I should've used more imported functions, especially those from `Data.List` more.
Our final solution used only two imports: `join` from `Control.Monad` and
`(\\)` from `Data.List`.
It can probably be reduced to just using `Prelude`, while still mantaining an acceptable
bytecount.

### Lesson #3

Think before you write code.

My first algorithm was far from perfect, and even had a flaw in its logic. When
this was fixed and it was golfed, we learned much to my chagrin that it used too
much memory and was silently failing. Soooooo ... we had to come up with a new one
from scratch. And then golf it again. Did I mention that I spent a lot of time
on this problem?

# The solution

## A first pass

The first thing I did was write a half-golfed program.

The algorithm was essentially to find every possible way to shift each of the strings
and overlay them. Some of these shifts would be invalid, so I discarded those.

I ignored the possibility of strings having trailing spaces, since that would
just be cruel. Later solutions also ignored the possibility of strings having leading
spaces. This turend out to be an OK assumption to make (though I wish it were told
to us in the prompt).

There is a mistake in this code: I am only taking the lexicographically minimal
solution, not the minimal length solution. This is fixed in the golfed version,
at the cost of many bytes.

```haskell
import Data.List
import Data.Maybe
import Control.Monad

-- | the solution function
g :: [String] -> String
g xs = minimum . catMaybes $ [sequence . dropWhile (==Nothing)
. map collapseW . transpose $ s | s <- shifts l xs]
where
l = sum $ map length xs

-- | find all possible ways to shift a string 's' with at most 'l'
-- characters of padding.
wordShifts :: Int -> String -> [String]
wordShifts l s = [space <> s | space <- spaces]
where
spaces = inits $ replicate l ' '

-- | find every possible way to shift a list of strings, where
-- each string can be shifted at most 'l' characters to the left
shifts :: Int -> [String] -> [[String]]
shifts l [] = [[]]
shifts l (w:ws) = do
shifted <- wordShifts l w
rest <- shifts l ws
return $ shifted : rest

-- | try to collapse a string into a single character (think of this as reducing
-- a column to 1 character, this fails if there are more than 2 non-space characters
-- or no non-space characters)
collapseW :: String -> Maybe Char
collapseW s = do
[y] <- foldMap convert s
pure y
where
convert ' ' = Nothing
convert c = Just [c]
```

I golfed this down to 178 bytes.

```haskell
f=foldMap
h ' '=[]
h c=[c]
k l s=[f<>s|f<-inits l]
z l[]=[[]]
z l(x:y)=[a:b|a<-k l x,b<-z l y]
g x=minimum[concat y|s<-transpose<$>z(f(>>" ")x)x,let y=f h<$>s,all((<=1).length)y]
```

## The first correct code

At precisely 181 bytes, this was the first code that picked the correct solution,
sorting the possible ones by length, then lexicographically.

```haskell
h=length
f[]=" "
f l=l
z l[]=[[]]
z l(x:y)=[a:b|a<-[i<>x|i<-inits l],b<-z l y]
g x=snd$minimum[(h a,a)|s<-z((>>" ")=<<x)x,let y=concat.words<$>transpose s,all((<=1).h)y,let a=f=<<y]

```

The problem now is that the code was failing silently! Usually the server would tell
you if you had an error (compilation, runtime, byte length), but it didn't respond and
instead failed after about 25-30 seconds.

We tested with code that ran infinitely, which would get cut off at exactly 1 minute,
so with some more testing we concluded that the code was using too much memory and
getting killed.

(we tested the memory usage with `foldl (+) 0 [1..]`, which timed out after 20 or so
seconds, and confirmed it was a problem with memory that caused silent failure
with `foldl' (+) 0 [1..]`, which only timed out after a minute. Ahhh Haskell, where one
character makes a huge difference)

## More compact code

I further golfed the first correct solution to 151 bytes.
I don't remember why I did this.
It was golfed during the phase where we were trying to figure out why the server
wouldn't accept our solution. Maybe I was trying to fit space stripping into the
code, but didn't get around to it.

I added some spaces to the last function `g` so it doesn't wrap so hard, but they
aren't part of the bytecount.

```haskell
h=length
f""=" "
f l=l
g x=snd$minimum[(h a,a)|s<-forM[(>>" ")=<<x|_<-x]inits,
let y=concat.words<$>transpose(zipWith(<>)s x),all((<=1).h)y,let a=f=<<y]
```

## The first accepted code

The first correct code took somewhere between 2 to 4 hours of effort to reach.
By midday Saturday, I had something that was morally right, but didn't pass the tests.
That evening, Giselle and I tracked down the root of the problem to space issues.

I complained to Kye about my solution not being good enough, despite being
short enough, and he devised an algorithm that was better (at least memory-wise) than
my like `O(n^n)` space algorithm. This was maybe around midnight on Saturday (which
was 3 AM his time...).

He, however, did not golf it, so I still had to reduce his ~500 bytes to the below.

I spent an hour or two golfing his solution after I woke up on Sunday
and brought it down to exactly 181 bytes.

```haskell
m(a:b)(x:y)=max a x:m b y
m x y=x<>y
f s[]=[s]
f s r=join[f(m s x)$r\\[x]|x<-r,and$zipWith(\x y->x==' '||y==' ')s x]<>[h:y|(h:t)<-[s],y<-f t r]
g y=snd$minimum[(length x,x)|x<-f""y]
```

Yup, this code is _a lot_ different. It ended up being a lot more inefficient than
his original code, too, since I cut out all of his optimizations when I golfed it.

Kye ended up writing an even _more_ efficient version, which thankfully I didn't need
to golf.

I just wish that we knew earlier that the "make it snappy" flavortext didn't
mean to make the code short, but instead meant "efficient enough." I'm used to seeing
time/space restrictions given upfront on the Code Golf Stack Exchange, where there
can be answers that are right by observation but too inefficient to run anything but
the simplest of test cases.

# Golf you a Haskell
Here were some of the more inventive or useful golfing transformations I used. Since
I foolishly neglected to use any resources other than the documentation,
these were all found by myself.

## Infix your code!
Haskell has a lot of infix operators (operators like `+` or `-` that go between
their arguments). It's made fun of in this
[article](http://uncyclopedia.wikia.com/wiki/Haskell)
(warning: somewhat NSFW language), which includes the following code that produces
[an infinite list of powers of 2](https://stackoverflow.com/questions/12659951/how-does-this-piece-of-obfuscated-haskell-code-work/12660526#12660526).

```haskell
import Data.Function (fix)
-- [1, 2, 4, 8, 16, ..]
fix$(<$>)<$>(:)<*>((<$>((:[])<$>))(=<<)<$>(*)<$>(*2))$1
```

The next few points are on how you can use these operators.

### `$`
A simple transformation that I often use in code read by people other than myself
is `$`. `$` is a function that just applies its left argument to its right argument.

```haskell
f $ x = f x
```

It has really low priority, though, so it can save you parentheses like in

```haskell
-- these are the same
gcd (1+2) (3*4)
gcd (1+2) $ 3*4
```

### `map`
`<$>` and `map` are the same for lists, but the former has lower priority, which lets
you reap some of the benefits of `$`, while also not needing a space between its
arguments (unlike `map`).

### `concatMap`
I found myself often using `concatMap`. I first reduced this to `foldMap`,
and then to the infix `=<<`. All of these have the same definition for lists.

### `return`
I used `return` to convert an element `x` to a singleton list.
This becaume `pure` and then finally `[x]` once I realized I was being a moron.

## Filling holes
A tricky part of this problem was combining two strings with holes (spaces) in them
to produce one where the holes were filled. As it turns out, the only printable ASCII
less than space (ASCII 32) is whitespace, so I figured these wouldn't show up in the
strings. Thus, given two chars in the same column, we can take their maximum to find
the non-space char (if it exists).

## Cheeky pattern matching
I had code that I wanted to give a list if `x` pattern matched one thing and the empty
list if it did not. Something that looked like

```haskell
case x of
(h:t) -> foo h t
[] -> []
```

I converted this to

```haskell
[foo h t|(h:t)<-[x]]
```

This abuses the fact that when the pattern match `(h:t)` fails in the context of a
list comprehension, an empty list is returned instead of an error. This is a special
case of how pattern matching is desugared inside of a `do` block.

N.B. `foo` has a different type between these examples.

## Finding the possible shifts
Given `strings :: [String]`, I wanted to find all of the ways these strings could be
shifted. I eventually boiled this down to finding `paddings ::[[String]]`, where each
`padding :: [String]` in the list was the same length as `strings` and had a varied
number of spaces in each element. So each `padding`, when combined element-wise with
`strings` would give a different shift.

A way of doing this would be

```haskell
import Data.List (inits)

cartesianProduct :: [[a]] -> [[a]]
cartesianProduct [] = [[]]
cartesianProduct (xs:xss) = [x : ys | x <- xs, ys <- cartesianProduct xss]

paddings :: [String] -> [[String]]
paddings strings = cartesianProduct $ replicate (inits maxPadding) (length strings)
where
maxPadding = replicate (sum $ map length strings) ' '
```

Let's take a look at

```haskell
maxPadding strings = replicate (sum $ map length strings) ' '
```

In order to make sure that I was shifting enough, I found a maximum padding that
was equal to the sum of the lengths of the strings.

We can reframe this as converting each element in `strings` to a string that is the same
length but only consisting of spaces, then concatenating all of these elements together.

```haskell
maxPadding strings = concatMap (\str -> replicate (length str) ' ') strings
```

`replicate . length` is way too long, so let's replace it with `(>> " ")`.

```haskell
maxPadding strings = concatMap (\str -> str >> " ") strings
```

Eta reduce and obfuscate `concatMap` to give

```haskell
maxPadding strings = (>> " ") =<< strings
```

Much better.

This is only the max padding for a single element though. I wanted to find `paddings`.
`inits :: [a] -> [[a]]` will get us part of the way there, since it will give all
possible prepended spaces, from 0 to `maxPadding`.

```haskell
inits [1,2,3] = [[],[1],[1,2],[1,2,3]]
inits (maxPadding ["a","bc","d"]) = ["", " ", " ", " ", " "]
```

We then want the cartesian product of `inits maxPadding` repeated `length strings`
times. `cartesianProduct` is a long definition, so why don't we
use the list Monad some more?

```haskell
paddings strings = sequence (replicate (inits maxPadding) (length strings))
```

`sequence` is the same as `\x -> forM x id` or `\x -> mapM id x`, so we can convert to

```haskell
paddings strings = forM (replicate (inits maxPadding) (length strings)) id
```

We want to be applying `inits` to every element anyway, so we can pull it out.

```haskell
paddings strings = forM (replicate maxPadding (length strings)) inits
```

Then get rid of this `replicate` nonsense by using a list comprehension that ignores
all of the values of `strings`.

```haskell
paddings strings = forM [maxPadding | _ <- strings ] inits
```

Substitute the definition of `maxPadding` and we're done.

```haskell
paddings strings = forM[ (>> " ") =<< strings | _ <- strings] inits
```

354 bytes of (reasonably) readable code down to 67 bytes of nonalphanumeric soup.

Don't you love code golfing?

# NP Completeness

On Saturday evening, I was banging my head against a wall trying to optimize the
space and time complexity of my algorithm. But every time I tried to think through
a faster algorithm, something felt wrong. I had a feeling that the problem was
[NP Complete](https://en.wikipedia.org/wiki/NP-complete), and so the only thing I could
get optimal would be space. I eventually sat down and proved it NP-Complete.

## Proof
The proof is by reduction from the
[Bin Packing Problem](https://en.wikipedia.org/wiki/Bin_packing_problem) (BPP). This is
a pretty rigorous proof, but I tried to make it understandable. I think the
[Reduction, visualized](#Reduction-visualized)
section does a pretty good job of building intuition for how the proof works, but if you
haven't seen a reduction before this all might seem kind of obtuse.

### Bin Packing Problem (decision version)
The BPP asks, given `m` bins of size `V` and a set `S` of elements with an associated
cost function `f : S -> N` (where `N` is the natural numbers), can `S` be partitioned
into at most `m` subsets, each of whose total cost is less than or equal to `V`? In
plain English, given `m` bins, each with capacity `V`, can we fill each bin to at
most capacity with all of the elements in `S`?

### Crypto Problem (decision version)
First, I will state the decision variant of the Crypto Problem (CP). Given a "set" `T`
of strings that have holes in them represented by spaces and a target length `l`, the
Crypto Problem asks whether all elements of `T` can be overlaid such that there is at
most one non-hole per column and the length of the overall result (after removing
trailing and leading spaces) is `l` or less.

(this "set" may have duplicates - I don't want to be as rigorous as in the BPP)

### Reduction
We can construct a CP from an arbitrary BPP as follows.

We first will construct the
set `T`. Create `m` strings, each of length `3V + 2`. The first and last `V+1`
characters of these strings are `#`, and the rest (the center) are spaces (holes).
Character choice doesn't matter here, so I pick an arbitrary one.
These strings will represent our bins, and I will refer to them as "bin strings".

For each element `s` in the set `S` from the BPP, we want to make an analagous element
for the CP. This element will be `f(s)` long, and consist only of the character `*`.
Again, character choice doesn't matter here. I'll refer to these as "`*` strings".

Now, we pick a length `l`. This CP will have a maximum length of `(3V + 2) * m`.

This reduction clearly takes polynomial time as it involves iterating as many times
as there elements of `S` and bins.

We claim that the given BPP is solvable if and only if this constructed CP is solvable.

### Reduction, visualized

Let's consider a BPP with `V = 3`, `m = 3`, and elements of size `1`, `1`, `2`, and `2`.

The bins:
```
# # # # # #
# # # # # #
# # # # # #
### ### ###
```

The elements:
```
* * * *
* *
```

A valid placement of these is putting a `1` in its own bin, a `2` in its own bin, and
a `1` and `2` in the last bin. Another valid placement would be to
fill two bins entirely and leave one empty.

```
# # # # #*#
# # #*# #*#
#*# #*# #*#
### ### ###
```

The equivalent problem in CP is the following strings:

```
#### #### (x3)
* (x2)
** (x2)
```

And an equivalent solution is the following:

```
#### ####
*
#### ####
**
#### ####
***
```

which, when flattened looks like

```
####* ########** ########***####
```

Note how the length is `(3V + 2) * m` = 33.

### Reduction proof (forward direction)

Given a solution to the BPP, we can make a solution to the CP. Suppose in the BPP
solution that some arbitrary bin `B` was filled to a volume `V' <= V` using the
elements in the set `S'`. This implies that

```
sum(f(s') for s' in S') = V' <= V
```

Since we created a bijection from elements of `S` in the BPP to elements of `T` in the
CP, find the corresponding elements from `S'` and put them in the set `T'`. We claim
that the elements of the set `T'` can be made to fit into a single bin string.

Why is this? Well, the sum of the lengths of these strings is less than `V`, which is
the amount of space in the middle of each "bin" string. Therefore, we can just
concatenate all of these strings together and place them in the center of the bin
string.

Since we can do this for an arbitrary bin, and there is a bijection from `S` to `T`,
i.e. all bins in the solution of the BPP span all of `S`, we can fit all of the elements
in `T` corresponding to elements in `S` inside of the "bin" strings. This means that
there exists a solution where we place all of the `*` strings inside of the bin
strings. Therefore, the overall length of this solution is just the length of all the
bin strings combined, which is `(3V + 2) * m` as desired.

### Reduction proof (backwards direction)

We'll prove the contrapositve of the backwards direction. If there doesn't exist
a solution to the BPP, we cannot make a solution to the CP.

First, it should be reasonably obvious from the previous direction that the strategy
of placing the `*` strings in the bin strings won't work, since the BPP is unsolvable.
If we could place them in the bins, then that would imply that the BPP was solvable,
which contradicts the premise.

So we would have to find a solution to the CP that doesn't involve _just_ putting the
`*` strings inside of the bin strings. Since our target length is `(3V + 2) * m`, we
need to fill all of the holes in the bins. If we aren't _just_ putting `*` strings
inside of bin strings, this means we would have to have bin strings intersect. This is
impossible, as on either end of the `V` holes in the center there are `V + 1` holes
on the side.

Visually, for `V` = 3, we get alignments that look like this.

```
top bin | #### #### | #### #### | #### #### | #### #### | ...
bottom bin | #### #### | #### #### | #### #### | #### #### | ...
```

The bins all have to intersect in _some_ column, which is not allowed in an answer.

Therefore, it is impossible to have a solution to the CP.

### Conclusion

Since we showed a reduction existed from the BPP to the CP that took polynomial time,
and that the constructed CP was solvable if and only if the corresponding BPP was,
we conclude that the Crypto Problem is NP-Complete.

Original writeup (https://github.com/doublestandardctf/GoogleCTF2019/blob/master/Code-Golf.md).

Code Golf

Comments

Sign in with