[switch] Further unification on switch

We've been reviewing the work to date on switch expressions. Here's where we are, and here's a possible place we might move to, which I like a lot better than where we are now. ## Goals As a reminder, remember that the primary goal here is _not_ switch expressions; switch expressions are supposed to just be an uncontroversial waypoint on the way to the real goal, which is a more expressive and flexible switch construct that works in a wider variety of situations, including supporting patterns, being less hostile to null, use as either an expression or a statement, etc. And the reason we think that improving switch is the right primary goal is because a "do one of these based on ..." construct is _better_ than the corresponding chain of if-else-if, for multiple reasons: - Possibility for the compiler to do exhaustiveness analysis, potentially finding more bugs; - Possibility for more efficient dispatch -- a switch could be O(1), whereas an if-else chain is almost certainly O(n); - More semantically transparent -- it's obvious the user is saying "do one of these, based on ..."; - Eliminates the need to repeat (and possibly get wrong) the switch target. Switch does come with a lot of baggage (fallthrough by default, questionable scoping, need to explicitly break), and this baggage has produced the predictable distractions in the discussion -- a desire that we subordinate the primary goal (making switch more expressive) to the more contingent goal of "fixing" the legacy problems of switch. These legacy problems of switch may be unfortunate, but to whatever degree we end up ameliorating these, this has to be purely a side-benefit -- it's not the primarily goal, no matter how annoying people find them. (The desire to "fix" the mistakes of the past is frequently a siren song, which is why we don't allow ourselves to take these as first-class requirements.) #### What we're not going to do The worst possible outcome (which is also the most commonly suggested "solution" in forums like reddit) would be to invent a new construct that is similar to, but not quite the same as switch (`snitch`), without being a 100% replacement for today's quirky switch. Today's switch is surely suboptimal, but it's not so fatally flawed that it needs to be euthanized, and we don't want to create an "undead" language construct forever, which everyone will still have to learn, and keep track of the differences between `switch` and `snitch`. No thank you. That means we extend the existing switch statement, and increase flexibility by supporting an expression form, and to the degree needed, embrace its quirks. ("No statement left behind.") #### Where we started In the first five minutes of working on this project, we sketched out the following (call it the "napkin sketch"), where an expression switch has case arms of the form: case L -> e; or case L -> { statement*; break e; } This was enough to get started, but of course the devil is in the details. #### Where we are right now We moved away from the napkin sketch for a few reasons, in part because it seemed to be drawing us down the road towards switch and snitch -- which was further worrying as we still had yet to deal with the potential that pattern switch and constant switch might have differences as well. We want a unified model of switch that deals well enough with all the cases -- expressions and statements, patterns and constants. Our current model (call this Unification Attempt #1, or UA1 for short) is a step towards a unified model of switch, and this is a huge step forward. In this model, there's one switch construct, and there's one set of control flow rules, including for break (like return, break takes a value in a value context and is void in a void context). For convenience and safety, we then layered a shorthand atop value-bearing switches, which is to interpret case L -> e; as case L: break e; expecting the shorter form would be used almost all the time. (This has a pleasing symmetry with the expression form of lambdas, and (at least for expression switches) alleviates two of the legacy pain points. Switch expressions have other things in common with lambdas too; they are the only ones that can have statements; they are the only ones that interact with nonlocal control flow.) This approach offers a lot of flexibility (some would say too much). You can write "remi-style" expression switches: int x = switch (y) { case 1: break 2; case 2: break 4; default: break 8; }; or you can write "new-style" expression switches: int x = switch (y) { case 1 -> 2; case 2-> 4; default-> 8; }; Some people like the transparency of the first; others like the compactness and fallthrough-safety of the second. And in cases where you mostly want the benefits of the second, but the real world conspires to make one or two cases difficult, you can mix them, and take full advantage of what "old switch" does -- with no new rules for control flow. #### Complaints There were the usual array of complaints over syntax -- many of which can be put down to "bleah, new is different, different is bad", but the most prominent one seems to be a generalized concern that other users (never us, of course, but we always fear for what others might do) won't be able to "handle" the power of mixed switches and will write terrible code, and then the world will burn. (And, because the mixing comes with fallthrough, it further engenders the "you idiots, you fixed the wrong thing" reactions.) Personally, I think the fear of mixing is deeply overblown -- I think in most cases people will gravitate towards one of the two clean styles, and only mix where the complexity of the real world forces them to, but there's value in understanding the underpinnings of such reactions, even if in the end they'd turn out to be much hot air about nothing. #### A real issue with mixing! But, there is a real problem with our approach, which is: while a unified switch is the right goal, UA1 is not unified _enough_. Specifically, we haven't fully aligned the statement forms, and this conspires to reduce expressiveness and safety. That is, in an expression switch you can say: case L -> e; but in a statement switch you can't say case L -> s; The reason for this is a purely accidental one: if we allowed this, then we _would_ likely find ourselves in the mixing hell that people are afraid of, which in turn would make the risk of accidental fallthrough _even worse_ than it is today. So the failing of mixing is not that it will be abused, but that it constrains us from actually getting to a unified construct. ## Closing the gap So, let's take one more step towards unifying the two forms (call this UA2), rather than a step away from it. Let's say that _all_ switches can support either old-style (colon) or new-style (arrow) case labels -- but must stick to one kind of case label in a given switch: // statement switch switch (x) { case 1: println("one"); break; case 2: println("two"); break; } or // also statement switch switch (x) { case 1 -> println("one"); case 2 -> println("two"); } If a switch is a statement, the RHS is a statement, which can be a block statement: case L -> { a; b; } We get there by first taking a step backwards, at least in terms of superficial syntax, to the syntax suggested by the napkin sketch, where if a switch is an expression, the RHS of an -> case is an expression or a block statement (in the latter case, it must complete abruptly by reason of either break-value or throw). Just as we expected "break value" to be rare in expression switches under UA1 since developers will generally prefer the shorthand form where applicable, we expect it to be equally rare under UA2. Then, as in UA1, we render unto expressions the things that belong to expressions; they must be total (an expression must yield a value or complete abruptly by reason of throwing.) #### Look, accidental benefits! Many of switches failings (fallthrough, scoping) are not directly specified features, as much as emergent properties of the structure and control flow of switches. Since by definition you can't fall out of a arrow case, then an all-arrow switch gives the fallthrough-haters what they want "for free", with no need to treat it specially. In fact, its even better; in the all-arrow form, all of the things people hate about switch -- the need to say break, the risk of fallthrough, and the questionable scoping -- all go away. #### Scorecard There is one switch construct, which can be use as either an expression or a statement; when used as an expression, it acquires the characteristics of expressions (must be total, no nonlocal control flow out.) Each can be expressed in one of two syntactic forms (arrow and colon.) All forms will support patterns, null handling, and multiple labels per case. The control flow and scoping rules are driven by structural properties of the chosen form. The (statement, colon) case is the switch we have since Java 1.0, enhanced as above (patterns, nulls, etc.) The (statement, arrow) case can be considered a nice syntactic shorthand for the previous, which obviates the annoyance of "break", implicitly prevents fallthrough of all forms, and avoids the confusion of current switch scoping. Many existing statement switches that are not expressions in disguise can be refactored to this. The (expression, colon) form is a subset of UA1, where you just never say "arrow". The (expression, arrow) case can again be considered a nice shorthand for the previous, again a subset of UA1, where you just never say "colon", and as a result, again don't have to think about fallthrough. Totality is a property of expression switches, regardless of form, because they are expressions, and expressions must be total. Fallthrough is a property of the colon-structured switches; there are no changes here. Nonlocal control flow _out_ of a switch (continue to an enclosing loop, break with label, return) are properties of statement switches. So essentially, rather than dividing the semantics along expression/statement lines, and then attempting to opportunistically heap a bunch of irrelevant features like "no fallthrough" onto the expression side "because they're cool" even though they have nothing to do with expression-ness, we instead divide the world structurally: the colon form gives you the old control flow, and the arrow form gives you the new. And either can be used as a statement, or an expression. And no one will be confused by mixing. Orthogonality FTW. No statement gets left behind. ## Explaining it Relative to UA1, we could describe this as adding back the blocks (its not really a block expression) from the napkin model, supporting an arrow form of statement switches with blocks too, and then restricting switches to all-arrow or all-colon. Then each quadrant is a restriction of this model. But that's not how we'd teach it. Relative to Java 10, we'd probably say: - Switch statements now come in a simpler (arrow) flavor, where there is no fallthrough, no weird scoping, and no need to say break most of the time. Many switches can be rewritten this way, and this form can even be taught first. - Switches can be used as either expressions or statements, with essentially identical syntax (some grammar differences, but this is mostly interesting only to spec writers). If a switch is an expression, it should contain expressions; if a switch is a statement, it should contain statements. - Expression switches have additional restrictions that are derived exclusively from their expression-ness: totality, can only complete abruptly if by reason of throw. - We allow a break-with-value statement in an expression switch as a means of explicitly providing the switch result; this can be combined with a statement block to allow for statements+break-expression. The result is one switch construct, with modern and legacy flavors, which supports either expressions or statements. You can immediately look at the middle of a switch and tell (by arrow vs colon) whether it has the legacy control flow or not.