Rhope Burn's 2013 ICFP Contest Writeup

Overview

This year, Rhope Burn consisted of Willliam Morgan, Joshua Garnett and me (Mike Pavone). Unfortunately, due to a number of issues Josh was not available during most of the contest. Apart from a small tool that provided some basic statistics on the available problems (which was written in Scala), our solution was written in a language I'm working on that is tentatively named TP. We only scored 405 points which puts us somewhere between 150th place and 175th place. This is a modest improvement over last year in which we placed 177th, but still far from our high watermark of 32nd place in 2011. You can see more information about which problems we solved on our page on the unofficial scoreboard put together by a member of team "hack the loop".

The Code

The code for our solver is located here. There was also a fair amount of library work done during the contest in the main TP repo.

The Language

TP, short for tablet programming, is a language I'm developing alongside a structured editor for use on mobile touchscreen devices. The hope is to end up with a language that is both easy to write and edit on a device with only a touchscreen that is also a reasonable choice for targeting the dominant mobile platforms. The language is very immature at the moment. I started work on it a bit less than a year and a half ago and I've been busy with a different side-project for a good portion of the time in between.

Hardware

We ran the solver on my somewhat old desktop with an AMD Phenom II X4 @ 3.2GHz and 4GB of RAM.

Timeline

Thursday Night

Josh quickly wrote a bit of Scala code to parse the myproblems JSON and produce some basic statistics. I worked on adding support for 64-bit integers to TP on my way home. Both of those were done by about 8PM PDT. From there, I started work on an evaulator for /BV while Bill started ramping back up on TP. By midnight, I had a working evaluator and code to generate all of the size 3 probems.

Friday

Bill started working on creating some objects for representing requests to the contest API and turning those objects into JSON strings. I continued to work on the code for generating programs of a given size. By around 1AM PDT, the program generation code was done. The rest of my early morning involved finishing the socket bindings, fixing some bugs that Bill ran into and doing a tiny bit of work on an HTTP client. At 5:30AM, I went to sleep. I had to work during the day and Bill was not able to make much headway without me due to rampup issues and implementation issues. After work, I was able to make a small amount of progress on the HTTP client, but I was pretty tired due to the small amount of sleep from the night before. I went to bed around 9PM PDt.

Saturday

I started on a basic JSON parser Saturday morning and finished by lunch time. With some help, Bill was able to finish serializing request objects and parse responses into response objects using the JSON parser. By 4:30 PM PDT, we had the JSON and HTTP code working well enough to send some basic test messages. I turned my attention towards the solver and by 11:30 PM are solver was able to communicate with the guess and eval APIs to solve individual programs. Bill started work on parsing the response from the myproblems API.

Sunday

At some point early in the morning, we had a driver program that used the myproblems JSON data to invoke the solver for every problem of a given size. By 6AM PDT, we had solved all of the problems of size 9 or smaller and some of the size 10 problems. Unfortunately we burned 6 size 10 problems due to a bug. At this point I went to sleep. The rest of Sunday was made up of fixing the above bug, improving the code that filtered programs as they were generated to better tackle size 11 and 12 problems and getting a simple solver for large problems working.

Strategy

For problems with size <= 9, we used a simple brute-force approach that generated all programs of the desired size. We then filtered these based on operator use and ran them against a set of randomly selected test-cases. We used the results of these tests to build a tree mapping output values for a given input. If we can determine that a program produces a constant value, we only run it against the first test input. If a program produces a unique output for a given input, we skip running the rest of the test cases on it. This strategy used about 1.2GB of RAM on my machine for the size 9 problems. Unfortunately, size 10 was too much for the meager 4GB in my desktop.

For size 10 problems and some size 11 and 12 problems, we used a slightly modified version of the above strategy. Rather than generating all the programs of a certain size upfront, we filtered based on operator usage as we were generating programs. This worked quite well for size 10, but unfortunately the initial version had some a bug that caused us to burn 6 programs from this size. I had forgotten to take into account the initial lambda and the program was generating size 11 programs instead of size 10 programs. Unfortunately, not all size 10 programs have equivalent size 11 programs. For size 11, we were able to solve all problems that did not use if0 and for size 12 we were able to solve all programs that did not use if0 or fold. If we had more time (this version of the solver was not completely done until well into Sunday) and/or computing resources, we probably could have solved the remaining size 11 and 12 programs.

For larger problems we genereated all programs with size <= 8, ran then against a certain number of test cases and hoped for the best. Unfortunately, this solver was not finished until ~3:30PDT on Sunday so we only ran it on a subset of the remaining problems. We probably could have run it with all programs with size <= 9, but since we were short on time that would have added an unacceptable amount of delay on startup. The program crashed on a method not implemented error a couple times which further limited the number of programs we were able to solve before the contest ended.

Unused Ideas

At one point I had hoped to use abstract interpretation to better identify constant programs, pick better test cases and identify equivalent programs. Another idea was to try to build programs after an eval request by picking a subtree for one argument of a binary operator, calculating the value produced by that subtree and then looking for trees that produce the necessary value for the other argument to produce desired value when the two are combined with the operator in question. We did not have time to even start working on either of these ideas.

What Went Well

We ran into a lot fewer compiler bugs this year than last and the bugs we did run into were easier to fix or work around. Adding support for unsigned 64-bit integers ended up taking less time than expected.

What Went Poorly

I originally wasn't planning on competing this year as I hadn't been working on TP much. This left us somewhat poorly prepared compared to previous years. There were fairly clear hints that a JSON parser and socket library would be useful before the contest started, but I didn't get a chance to act on them. Further, since I wasn't planning on competing, I didn't take Friday off from work. Bill did, but due to a combination of needing to ramp on the language again and the difficulty of working with the very rough implementation he was not able to make much headway until we were working in the same room on Saturday. The wimpyness of my desktop compared to what some other teams were using also hurt a bit.