Date Sun 28 September 2014 Tags clojure / core.typed / functional / lenses

In a previous post, I built up a framework for lens-like constructs in Clojure: essentially some fancified versions of assoc-in and get-in to allow for bidirectional transformations along the nesting path and some utilities to generate special-purpose getter/setter functions. The name, "pinhole," is supposed to suggest a more primitive, utilitarian mechanism for achieving focus.

While still ruing (sort of) other mistakes, I found myself worrying that a triumphal sentence near the end of the piece

What's more, thanks to the expressive power of dynamic Clojure, and higher order functions, these lenses are not just simple to use but simple to create.

was somewhat off the mark. Thanks to shoddy writing, one can't be sure, but if "dynamic Clojure" was referring to "dynamically typed Clojure," the sentence is not just vague, but precisely wrong. Evidence that this is indeed what I meant is provided by comparative references elsewhere in the piece to "type goodness" in the Scalaz implementation.

The fact is that dynamic typing is not at all necessary for lens operations. Moreover, it probably isn't even necessary for most uses of the core -in functions.

Want, want, want, want, want, want

Last week, what I wanted was:

1. Paths that allow arbitrary transformations along the lookup/retrieval path. 2. A convenient way to specify a dictionary of aliases to such paths. 3. The usual lensy guff of special-purpose getters, setters and updaters.

Let's add a fourth and fifth:

4. Compile time type-checking with core.typed. 5. Better performance than corresponding pinhole and core functions.

Those sound hard, so let's add an easy one.

6. A really stupid name.

So, "tinholes." The "t" is for type safety, and people make pinhole cameras with tin foil, so it kind of makes sense. I was a little worried that it might be an obscenity in some corner of the internet, but this seems not to be the case. It's merely stupid.

Statically typed Clojure

I'm a huge fan of core.typed, with which, for the purposes of this post, I will assume you're familiar. If you're not, there are links to resources at typedclojure.org, and I once wrote a tour/introduction in two posts that, some claim to have found helpful.

If you're not familiar with core.typed and you don't have the time to make yourself so right now, the main things to know are

That it's an optional type system that works outside the language, using annotations that have absolutely no impact on the compiled code. You can check out the legality of a namespace with (t/check-ns) or of an individual form with t/cf . Also, if it's not obvious, I've :require d [clojure.core.typed :as t] .

Type-checking is somewhat difficult with the current implementations of ph-whatever , as, intuitively, it would be with anything implemented as a recursive function that might consume and return values of different types at different levels of the stack. For example, core/assoc-in has a classically recursive definition

( defn assoc-in [ m [ k & ks ] v ] ( if ks ( assoc m k ( assoc-in ( get m k ) ks v )) ( assoc m k v )))

and a conventional type annotation of

( t/IFn [( t/U ( clojure.lang.Associative t/Any t/Any ) nil ) ( clojure.lang.Seqable t/Any ) t/Any -> t/Any ])

which is sub-microns away from total uselessness. It might be of some help if you were at risk of providing utterly random arguments, (say scalars or functions), but truly it's a fiesta of type Any , and there isn't anything to be done about it.

Let's recall the Turtle example from last time, but be pious little children and add some type annotations:

( t/ann-record Point [ x :- Number , y :- Number ]) ( t/ann-record Color [ r :- Short , g :- Short , b :- Short ]) ( t/ann-record Turtle [ position :- Point , color :- Color heading :- Number ]) ( defrecord Point [ x y ]) ( defrecord Color [ r g b ]) ( defrecord Turtle [ position heading color ]) ( def bruce ( ->Turtle ( ->Point 1.0 2.0 ) ( / Math/PI 4 ) ( ->Color 255 0 0 )))

All of this fastidious typing is unfortunately of limited use once we peek beneath the shell:

user> ( t/cf ( get-in bruce [ :position :x ])) t/Any

That's disappointing; core.typed can't figure out that we're going to get back a Number . But it gets even worse. The type checker will let us get away with horrors like this:

user> ( t/cf ( get-in bruce [ :hey :ho ])) t/Any

The more complicated pinhole lenses are just as bad. (If the following example makes no sense, you really might want to go back and read the pinhole post.) Following standard advice, we could annotate a generated function

( def turtle-forward ( mk-ph-mod movexy [ :position :x ] [ :position :y ] [ :heading ]))

using the ^:no-check provision

(t/ann ^:no-check turtle-forward (t/IFn [Turtle Number -> Turtle]))

meaning that misuse of turtle-forward in subsequent code will be preventable, but there's no assurance that we got it right in the first place.

Macros to the rescue

If the problem is that core.typed has no visibility into types that are determined at runtime, let's try to determine them at compile time. Were we to rewrite (get-in bruce [:position :x])) explicitly as (get (get bruce :position) :x) , then the type inference engine would have no trouble at all:

user> ( t/cf ( get ( get bruce :position ) :x )) java.lang.Number

It would be irritating to lose the convenience provided by get-in , but fortunately we don't have to. A macro can do the rewriting for us,

( defmacro th-get-in [ m path ] ( reduce ( fn [ acc k ] ( concat acc ( list ( if ( vector? k ) ` ( ~ ( second k )) ` ( get ~ k ))))) ` ( -> ~ m ) path ))

trivially throwing in bidirectional transforms as well:

user> ( macroexpand-1 ' ( th-get-in bruce [ :position :x [ inc dec ]])) ( clojure.core/-> bruce ( clojure.core/get :position ) ( clojure.core/get :x ) ( dec ))

Now, we know what we're dealing with:

user> ( t/cf ( th-get-in bruce [ :position :x [ inc dec ]])) java.lang.Number

and if we try any funny stuff

user> ( t/cf ( th-get-in bruce [ :posn :x [ inc dec ]])) Type Error ( acyclic/utils/tinhole.clj :1:7 ) Static method clojure.lang.Numbers/dec could not be applied to arguments : ...

we get totally smacked. What's more, there are not insignificant performance gains from expanding nested gets at compile time:

user> ( time ( dotimes [ n 10000000 ] ( th-get-in bruce [ :position :x [ inc dec ]]))) "Elapsed time: 1237.301 msecs" nil user> ( time ( dotimes [ n 10000000 ] ( get-in bruce [ :position :x ]))) "Elapsed time: 2076.322 msecs" nil

The better performance is related to a trade-off in flexibility, but it's a trade-off that you probably don't mind. You could in principle want to pass in a different path every time you call assoc-in , but with th-assoc-in , the path is burned in as constants at compile time. This also means you'll run into trouble if you try

user> ( def p [ :position :x ]) user> ( th-get-in bruce p ) IllegalArgumentException Don 't know how to create ISeq from : clojure.lang.Symbol clojure.lang.RT.seqFrom ( RT.java :505 )

because the macro is receiving the symbol p instead of the expected vector of stuff and has no idea what to do with it.

th-assoc-in is more complicated

As in pinhole-land, the related bidirectional, transforming association is more complicated, because we need to apply the outbound transformation functions while unwrapping a structure, before calling the inbound transformations when putting it all back together.

In this case, it was a bit more pleasant to implement the code-emitting within an actual recursive function, which is then invoked by a macro. (As opposed, critically, to being invoked by code generated by a macro; all recursion here takes place at compile time and runs only once.) Among other things, I could avail myself of at least a little type checking while working: the Any s are unavoidable, but at least I know I won't try to recur with the wrong number of arguments.

The code generation code,

( t/ann th-assoc-in-gen ( t/IFn [ t/Any ( t/NonEmptySeq t/Any ) t/Any -> t/Any ])) ( defn- th-assoc-in-gen [ m ks v ] ( let [ k ( first ks ) ks ( next ks )] ( cond ( vector? k ) ( let [[ f-in f-out ] k ] ( list f-in ( if-not ks v ( th-assoc-in-gen ( list f-out m ) ks v )))) ks ( list 'assoc m k ( th-assoc-in-gen ( list 'get m k ) ks v )) :else ( list 'assoc m k v )))) ( defmacro th-assoc-in [ m ks v ] ( th-assoc-in-gen m ks v ))

almost parallels ph-assoc-in ,

( defn ph-assoc-in [ m [ k & ks ] v ] ( cond ( vector? k ) ( let [[ f-in f-out ] k ] ( f-in ( if-not ks v ( ph-assoc-in ( f-out m ) ks v )))) ks ( assoc m k ( ph-assoc-in ( get m k ) ks v )) :else ( assoc m k v )))

except that that s-expressions of the form (something ...) are now (list 'something ...) , so the output is unexecuted code, e.g.

user> ( macroexpand ' ( th-assoc-in bruce [ :position :x [ inc dec ]] 5 )) ( assoc bruce :position ( assoc ( get bruce :position ) :x ( inc 5 )))

Complications with the dictionary of aliases

Naively coding the macros that take a dictionary argument, we will run into the same problem we saw when passing path as a variable rather than as a literal vector. We can get around the problem by forcibly eval ing the path-dict within the macro, thus, during pre-compilation, expanding the symbol into (presumably) a map of aliases:

( defmacro th-get [ path-dict m k ] ` ( th-get-in ~ m ~ ( ph/condition-key ( eval path-dict ) k )))

In the function-based implementation, eval is unnecessary, because arguments are not passed as symbolic literals.

( defn ph-get [ path-dict m k ] ( ph-get-in m ( condition-key path-dict k )))

This is a trick you want to use conservatively, since many times the arguments that get passed to macros can't possibly be evaluated at compile time So, for example, while this

( def p { :bar [ :position :x ]}) ( defn foo [ m ] ( th-get p m :bar ))

works, this

( defn foo [ m ] ( let [ p ( assoc {} :bar [ :position :x ])] ( th-get p m :bar )))

will bomb, with the message that you "Can't eval locals."

Getters, setters and modifiers

The macros for creating getters are setters are very straightforward,

( defmacro mk-th-set ([ ks ] ` ( fn [ o# v# ] ( th-assoc-in o# ~ ks v# )))) ( defmacro mk-th-get [ ks ] ` ( fn [ o# ] ( th-get-in o# ~ ks )))

but it's interesting to think about how type checking works with very complex nesting. For example,

( th-assoc-in { :a { :b "{:c 3}" }} [ :a :b [ pr-str read-string ] :c ] 5 )

will correctly return {:a {:b "{:c 5}"}} , but it doesn't type check, because read-string returns t/Any , to which there's no guarantee that one can assoc anything.

In a case like this, the idea is to confine the unprovable type assertions to as small a domain as possible, which in this case means swearing up and down that the string transformations will behave properly:

( t/defalias Silly "silly map" ( t/HMap :mandatory { :a ( t/HMap :mandatory { :b t/Str })})) ( t/defalias Billy "stuff in b" ( t/HMap :mandatory { :c t/Num })) ( t/ann ^ :no-check s->billy ( t/IFn [ t/Str -> Billy ])) ( t/ann ^ :no-check billy->s ( t/IFn [ Billy -> t/Str ])) ( defn s->billy [ s ] ( read-string s )) ( defn billy->s [ b ] ( pr-str b )) ( t/ann x Silly ) ( def x { :a { :b "{:c 3}" }})

We can create a getter that passes (t/check-ns)

( t/ann g ( t/IFn [ Silly -> t/Num ])) ( def g ( mk-th-get [ :a :b [ billy->s s->billy ] :c ]))

More importantly,

( def h ( mk-th-get [ :a :b [ billy->s s->billy ] :goat ]))

does not.

Turtles on the march

Now let's build a type-checkable turtle-forward .

As before, we need a function to some x and y position, heading and distance, and to return new values of x and y, but this time, it should be properly annotated. Trigonometry is an annoyance, since the static methods in Math aren't proper IFn s, so we have to wrap them with ^:no-check ed functions. Again, this is a compromise, but it's a tightly contained compromise, where visual inspection is, if not guaranteed to succeed, at least likely to do so:

( t/ann ^ :no-check Cos ( t/IFn [ Number -> Number ])) ( t/ann ^ :no-check Sin ( t/IFn [ Number -> Number ])) ( defn Cos [ a ] ( Math/cos a )) ( defn Sin [ a ] ( Math/sin a )) ( t/defn movexy [ x :- Number y :- Number dir :- Number dist :- Number ] :- ( t/HVec [ Number Number ]) [( + x ( * dist ( Cos dir ))) ( + y ( * dist ( Sin dir )))])

Now, instead of

( def turtle-forward ( mk-ph-mod movexy [ :position :x ] [ :position :y ] [ :heading ]))

we'll call a new macro version

( def turtle-forward ( mk-th-mod movexy 2 1 [ :position :x ] [ :position :y ] [ :heading ]))

which, a little awkwardly, requires specifying the number of return values from movexy as well as the number of arguments in addition to the turtle that it will expect. The macro will need these numbers before it has a chance to movexy . I suspect it's possible to extract the information from the type declaration, but this doesn't seem to be very straightforward.

Now, mk-th-mod is not the world's most complicated macro, but it is hefty, and I wanted to take pains that its output be legible by humans, especially as those humans may be called upon to interpret type errors referring to it. The pretty-printed macro expansion for turtle-forward is:

( fn* ([ obj-26666 more-arg-26668-0 ] ( clojure.core/let [ arg-26667-0 ( th-get-in obj-26666 [ :position :x ]) arg-26667-1 ( th-get-in obj-26666 [ :position :y ]) arg-26667-2 ( th-get-in obj-26666 [ :heading ]) [ fv-26669-0 fv-26669-1 ] ( movexy arg-26667-0 arg-26667-1 arg-26667-2 more-arg-26668-0 )] ( clojure.core/-> obj-26666 ( th-assoc-in [ :position :x ] fv-26669-0 ) ( th-assoc-in [ :position :y ] fv-26669-1 )))))

The 5-digit numbers are courtesy of gensym . Removing them and reformatting only slightly, we see an incredibly straightforward function that barely requires explanation:

( fn* ([ obj more-arg-0 ] ( clojure.core/let [ arg-0 ( th-get-in obj [ :position :x ]) arg-1 ( th-get-in obj [ :position :y ]) arg-2 ( th-get-in obj [ :heading ]) [ fv-0 fv1 ] ( movexy arg-0 arg-1 arg-2 more-arg-0 )] ( clojure.core/-> obj ( th-assoc-in [ :position :x ] fv-0 ) ( th-assoc-in [ :position :y ] fv-1 )))))

To achieve this, legibility, it's necessary to go a little beyond the name# sugar and call gensym directly. To generate a series of uniquely symbols that look like a series, we have

( t/defn gensyms [ s :- t/Str n :- t/Int ] :- ( t/Seq t/Sym ) ( let [ s ( gensym ( str s "-" ))] ( map # ( symbol ( str s "-" % )) ( range n ))))

Our macro

Generates some symbols to hold the extracted arguments, user-provided arguments and function results: ( defmacro mk-th-mod [ f n-out n-more & amp ; arg-paths] ( let [ o ( symbol ( name ( gensym "obj-" ))) n-args ( count arg-paths ) args ( gensyms "arg" n-args ) margs ( gensyms "more-arg" n-more ) fnvals ( gensyms "fv" n-out )] generates code to extract arguments from the structure, argvals ( map # ( list 'th-get-in o % ) arg-paths ) builds up a function that takes a structure and the more-arg- arguments, ` ( fn [ ~ o ~@ margs ] evaluates the extraction code and let-binds its results to the arg- variables, ( let [ ~@ ( interleave args argvals ) evaluates the user function ( movexy ) and destructures the results into the fv- variables, [ ~@ fnvals ] ( ~ f ~@ args ~@ margs )] threads the original structure through calls to th-assoc-in to place the fv- values where they belong: ( - & gt ; ~o ~@ ( map # ( list 'th-assoc-in ( nth arg-paths %1 ) ( nth fnvals %1 )) ( range n-out ) ))))))

The function built by this macro behaves as expected

( turtle-forward bruce 3 ) # user.Turtle { :position # user.Point { :x 3.121320343559643 , :y 4.121320343559642 } , :color # user.Color { :r 255 , :g 0 , :b 0 } , :heading 0.7853981633974483 }

and type-checks, but innocent-looking perturbations like

( def turtle-forward2 ( mk-th-mod movexy 2 1 [ :position :x ] [ :position :y ] [ :angle ]))

or

( t/defn movexy [ x :- Long y :- Long ...

explode spectacularly during check-ns .

To be fair, the explosions that core.typed enjoys are not all that user friendly, but that's excusable under the cruel-to-be-kind doctrine.

We can have it all

We wanted (in so many words) type-safe, efficient, concise, idiomatic and flexible handling of deeply nested data structures, and that's what we got.

Prima facie, these were unreasonable requests - it's not as though other languages were lining up to answer them - but Clojure and its ecosystem keep living up to their advertised aptitude for making difficult sounding things simple, and sometimes even easy. It's worth enumerating some of the machinery from which we benefited:

Immutable, persistent data structures. Without these, the whole conversation might never have ended very early in frustration, because a mutable nested hash table may be the most dangerous programmatic construct every conceived of. Homoiconicity, i.e. macros that can be written in some semblance of the language they emit. Optional typing via core.typed . It was both necessary and natural in this work to rely alternately on dynamic and static typing.

Regarding the last point, I should admit that I was guilted into this iteration of the project by Ambrose's Strangeloop talk. You take motivation where you find it.