These are the answers, along with some commentary, for part 1 (Algorithms and Data Structures) of my software development final exam

Each question was graded out of 5, with points awarded for different parts of each question as described below; in some cases I awarded partial marks (e.g., 2 of 3 possible marks for part of a question) when someone had the right general idea but missed some details.

Question: Is O(2^n) equal to O(3^n)? Why? Answer: No. O(f(n)) = O(g(n)) if and only if f(n)/g(n) is bounded between two positive real numbers for all sufficiently large values of n; but as n increases, the ratio 2^n / 3^n tends to zero. Grading: I awarded 2 marks for recognizing that these two sets of functions are not equal, and 3 marks for explaining why; for full marks, I wanted to see some mention of the limiting behaviour as n increases. Commentary: The average score on this question was 2.8/5, which I found disappointingly low. Roughly a third of people thought that the two classes were equal, with the most common justification being that "we can ignore constants". While it is true that these both fall into the "exponential with linear exponent" class E, big-O notation is more precise and requires that functions are asymptotically within a constant multiplicative factor of each other. A few people were confused by the notation, having seen "f(n) = O(g(n))" too many times; that is a misleading abuse of the "=" symbol, since it does not exhibit equivalence-relation behaviour. A better notation is "f(n) ∈ O(g(n))", since that makes it clear that O(g(n)) represents a set of functions. I included this question as a test of whether people knew what big-O notation really means — in practice, it's very unlikely that anyone will need to compare O(2^n) and O(3^n) algorithms, but this made the two classes of functions obscure enough that people needed to rely on their understanding of big-O notation rather than memorized hierarchies.

Question: What is the expected run-time of quicksort? What is the worst-case run-time? Answer: The expected run-time of quicksort (for random inputs) is O(n log n). The worst-case run-time of quicksort is O(n^2). Grading: I awarded 2 marks for the expected run-time and 3 marks for the worst-case run-time. Commentary: The average score on this question was 4.8/5, which I found quite surprisingly high. I have seen many people use quicksort in situations where the worst case either occurs quite often or can be easily caused by an attacker to occur, so my hypothesis was that many software developers had forgotten about the poor worst-case behaviour; in fact, more people got the worst-case behaviour correct than the expected behaviour. Some people pointed out that using a linear-time median-finding algorithm allows quicksort to have an improved worst-case; the observation was welcome but not necessary. A few people stated that the expected running time of quicksort was O(log n); I have no idea what sort of confusion would lead to this, unless perhaps it was simply a common typo.

Question: What is the difference between a binary tree [EDIT: that should be binary search tree] and a B-tree? When and why does this difference matter? Answer: Internal nodes in a binary tree have at most two children, while internal nodes in a B-tree can have many children (depending on the block size). This is useful when the tree data is stored in a manner which exhibits large read latencies (e.g., on disk) since the higher fan-out means that the height of a B-tree is less than the height of a corresponding binary tree. Grading: I awarded 2 marks for identifying that a B-tree can have more than 2 children per internal node; 2 marks for identifying that this makes B-trees faster on disk (or generally when access times are determined primarily by latency rather than bandwidth); and 1 mark for drawing the connection from having more children per internal node to having a smaller tree height. I awarded 1 mark to people who didn't mention that B-trees have an advantage on disk but instead mentioned that they are balanced (which not all binary search trees are); while this is true, the issue of balance alone should not cause someone to reach for a B-tree, since there are balanced binary trees (e.g., red-black trees) which are much simpler to implement than B-trees. Commentary: The average score on this question was 3.2/5; most people knew that a B-tree had more children per internal node, but only half of respondents knew that this made them useful for data stored on disk. A sizeable number of people (including several who didn't know what exactly a B-tree was) remembered that B-trees were used by databases and filesystems but weren't sure why. I included this question because I wanted to test not only whether people knew standard datastructures, but also whether people would be able to pick the right data structure for a task — and knowing when and why one datastructure is better than another is absolutely critical for that. Marking this question brought to mind a common remark my mother — a former high school mathematics teacher, and IB examination marker — used to make: Answer the question asked. There were many people who explained what a B-tree is and when it's useful, but didn't explain why — and while in many cases I thought they probably knew the answer to that third part, I couldn't give marks for knowledge they didn't express.

Question:Name the heap operations used by heapsort and the asymptotic running time of each. Answer:There are two operations: Heapify, which takes n elements and builds a heap out of them in O(n) time; and pop-max, which deletes the largest element from the heap and returns it in O(log n) time. Grading: I accepted a number of different formulations for heapsort and names for the functions. For the initial heap-building process, I awarded 3 marks for stating that the process took O(n) time, or 1 mark for saying that an element could be added in O(log n) time. I awarded 2 marks for stating that it took O(log n) time to remove the maximum (or minimum, in some formulations) element and make the necessary adjustments to retain the heap property. Commentary: The average score on this question was 2.2/5; a large number of people couldn't remember what a heap was or how heapsort worked. I found this quite disappointing; while in most cases developers will want to use either quicksort or samplesort (if not concerned about worst-case behaviour) or mergesort, heaps are nonetheless a good data structure to know about for their relevance to priority queues. Of those people who remembered about heapsort, a large majority did not know that a heap can be constructed in linear time. This is an important thing to know if you ever intend to use a heap: While it doesn't affect the asymptotic behaviour of heapsort (since extracting the elements still takes n log n time), it reduces the implicit constant; it can also be useful if you only want to extract a limited number of the largest elements. A few people explained how linear-time heap construction works — by "building up" with progressively larger sub-heaps — which was welcome but not necessary.